Human and Machine Learning

Due 11/17/2015

The
goal of this assignment is to give you a
first-hand appreciation of the Dirichlet process mixture
model. You will use the Chinese restaurant process for
computing probabilities and sampling.

Make a
graph showing the probability that a new
customer will be assigned to an empty table in the CRP, as a function
of the
number of customers already in the restaurant. For example,
when
there are 0 customers in the restaurant, the next customer -- customer
1 -- will be assigned to a new table with probability 1.
Use α = 0.5, and plot your graph for up to
500
customers.

Hint: If you're thinking about running a sampler, you should think again. You can compute this probability analytically.

Hint: If you're thinking about running a sampler, you should think again. You can compute this probability analytically.

Using
the CRP, draw samples from a Dirichlet process mixture model with
Gaussian components in a 2 dimensional observation space. For this
draw, use α = 0.5 and G0 = (X,Y) with X ~
Uniform(0,1) and Y ~ Uniform(0,1) specifying the Gaussian means. Use
σ = 0.1 for all
components. Plot some sample draws from
the DPMM with 500 points. Below are 3 examples that I generated. I
color coded the points to indicate the compnent that generated the
observation..

If you
are ambitious, write a Gibbs sampler to infer the underlying clustering
of a set of data points assumed to have been generated by a Dirichlet
process mixture model. I have created a set of data points
either in matlab
format or in raw
text format. Along with the
(x,y) coordinates of each point is a label indicating the cluster that
generated the point. Here are the data:

Running the sampler involves first assigning each customer (data point) to a table (cluster). Then you'll pick random customers, remove them from the restaurant, and draw from the CRP posterior to pick a new table for them. For both the initial assignment and the re-assignment, the posterior probability of assigning a customer to a table is proportional to product of (1) the CRP prior, which is based on table occupancies and α, and (2) the likelihood of the observed (x,y) value given estimated parameters for each table.

Several points of advice:

(1) For a customer assigned to a new table, you'll have to compute a likelihood of their observed (x,y) value. To handle this case, I drew parameters for the new (empty) table from the prior distribution G0, which yields a likelihood that is dependent on the randomly drawn cluster center from the customer's (x,y) value.

(2) In class, we discussed resampling the table assignments. But you'll also have to resample the table parameters (the cluster centers). There are two ways to do this:

The implementation requires a lot of bookkeeping. Using the maximum a posteriori parameter estimates, each time a customer is removed from the restaurant for reassignment, you'll want to recompute the parameter values associated with that customer's former table, and similarly each time a customer is added to a table. You'll also have to deal with the fact that as customers shift around, occupied tables can become unoccupied. Unoccupied tables must be deleted. The CRP should allow only one unoccupied table at a time.

Here are two examples of inferred clusters which I obtained by running the sampler for 100 updates of each customer's table assignment. The example on the left is pretty good; the example on the right is not great.

Running the sampler involves first assigning each customer (data point) to a table (cluster). Then you'll pick random customers, remove them from the restaurant, and draw from the CRP posterior to pick a new table for them. For both the initial assignment and the re-assignment, the posterior probability of assigning a customer to a table is proportional to product of (1) the CRP prior, which is based on table occupancies and α, and (2) the likelihood of the observed (x,y) value given estimated parameters for each table.

Several points of advice:

(1) For a customer assigned to a new table, you'll have to compute a likelihood of their observed (x,y) value. To handle this case, I drew parameters for the new (empty) table from the prior distribution G0, which yields a likelihood that is dependent on the randomly drawn cluster center from the customer's (x,y) value.

(2) In class, we discussed resampling the table assignments. But you'll also have to resample the table parameters (the cluster centers). There are two ways to do this:

- use Gibbs sampling: the posterior conditioned on all the table assignments turns out to be pretty simple given the way I set up the generative model. It is a Gaussian distribution centered on the mean (x,y) of all customers at the table, with σ = 0.1.
- use the maximum a posteriori estimate, which is just the mean (x,y) of all customers at the table

The implementation requires a lot of bookkeeping. Using the maximum a posteriori parameter estimates, each time a customer is removed from the restaurant for reassignment, you'll want to recompute the parameter values associated with that customer's former table, and similarly each time a customer is added to a table. You'll also have to deal with the fact that as customers shift around, occupied tables can become unoccupied. Unoccupied tables must be deleted. The CRP should allow only one unoccupied table at a time.

Here are two examples of inferred clusters which I obtained by running the sampler for 100 updates of each customer's table assignment. The example on the left is pretty good; the example on the right is not great.