Probabilistic Models of

Human and Machine Intelligence

Fall 2015

Due Sep 10

The
goal of this assignment is to give you a
concrete understanding of the Tenenbaum (1999) work by implementing
and experimenting with a small scale version of the model applied to
learning concepts in two-dimensional spaces. The
further goal is to get hands-on experience representing and
manipulating probability distributions and using Bayes' rule.

Consider
a two-dimensional input space with
features in the range [-10, 10]. We will consider only square concepts,
and concepts centered on the origin (0,0). We will also
consider
only a discrete set of concepts, H = {hi,
i=1...10}, where hi
is the concept with
lower left corner (-i, -i) and upper right corner (+i, +i), i.e., a
square with the length of each side being 2i.

You will have to define a discrete prior distribution over the 10 hypotheses, and you will have to do prediction by marginalizing over the hypothesis space. Use Tenenbaum's expected-size prior. Because the expected size prior is defined over a continuous distribution, you will need to compute the value of the prior for each of the 10 hypotheses, and renormalize the resulting probabilities so that the prior distribution sums to 1. (You don't actually need to do this renormalization, because the normalization factor cancels out when you do the Bayesian calculations, but go ahead and do it anyhow, just to have a clean representation of the priors.)

You will have to define a discrete prior distribution over the 10 hypotheses, and you will have to do prediction by marginalizing over the hypothesis space. Use Tenenbaum's expected-size prior. Because the expected size prior is defined over a continuous distribution, you will need to compute the value of the prior for each of the 10 hypotheses, and renormalize the resulting probabilities so that the prior distribution sums to 1. (You don't actually need to do this renormalization, because the normalization factor cancels out when you do the Bayesian calculations, but go ahead and do it anyhow, just to have a clean representation of the priors.)

Make a
bar graph of the prior distribution, P(H), for 𝜎_{1} = 𝜎_{2}
= 4. Make a graph of the prior distribution for 𝜎_{1}
= 𝜎_{2} = 10.

Given one observation, X = {(1.5, 0.5)}, compute
the posterior P(H|X) with 𝜎 = 10. You will
get one
probability for each possible hypothesis. Display your result either as
a bar graph or a list of probabilities.

Using the results of Task 2, compute
generalization predictions, P(y|X), over
the whole input space for X = {(1.5, .5)} and 𝜎 = 10. The input
space should span the region from (-10,-10) to (+10,+10). Display your
result as a contour map in 2D space where the coloring of the contour
map represents the probability that an input at that point in the space
will be a member of the concept. (If the probabilities are becoming
very small, you may wish to show log(probability) in the contour
map to allow for a wider dynamic range.)

Repeat Task 3 for X = {(4.5, 2.5)}..

Compute
generalization predictions, P(y|X), over
the whole input space for 𝜎 = 20 and three different sets of input
examples: X = {(2.2, -.2)}, X = {(2.2, -.2), (.5, .5)}, and X = {(2.2,
-.2),
(.5, .5), (1.5, 1)}. Describe how the posterior is changing as new
examples are added, and explain why this occurs.

Do
some other interesting experiment with the
model. One possibility would be to extend the model to
accommodate
negative as well as positive examples. Another possibility
would
be to compare generalization surfaces with and without the size
principle, and with an uninformative prior (here, uniform would work)
compared to the expected-size prior.

I
would like hardcopies
of your work. It's easier for you than putting together a
single
word / pdf document. And it's easier for me than keeping tabs
on
electronic documents. You may hand in code if you like, but that
is not necessary.