Probabilistic Models of

Human and Machine Intelligence

Part I Due Tue March 20, 2018

Part II Due Thu Mar 22, 2018

The goal of
this assignment is to introduce you to probabilistic
programming languages. These languages allow you to
specify graphical models with random variables and to perform
inference by sampling.

I want you to investigate probabilistic
programming languages and identify one that you want to work
with. Install the software, and run through one or more
tutorial examples to convince yourself that you understand
basically how the language works. I list a bunch of options on
the course home page, and there are even more at probabilistic-programming.org.

I have no expertise in these languages but after spending a few days looking at the options, there are five I'd suggest to investigate further. The first three are likely to be the most valuable in the future, because they allow for the integration of Bayesian methods and neural networks. The field is definitely headed in this direction. Each of these three languages is built on top of a gradient-based optimization library, with efficient GPU operations for multidimensional array. The five languages I'd recommend, roughly ordered from strongest to weakest recommendation, are:

*For Part I, there is nothing to hand in. *

I have no expertise in these languages but after spending a few days looking at the options, there are five I'd suggest to investigate further. The first three are likely to be the most valuable in the future, because they allow for the integration of Bayesian methods and neural networks. The field is definitely headed in this direction. Each of these three languages is built on top of a gradient-based optimization library, with efficient GPU operations for multidimensional array. The five languages I'd recommend, roughly ordered from strongest to weakest recommendation, are:

- Edward: Robust language, well written documentation, built on top of tensorflow. If you know tensorflow, Edward is the way to go. Tensorflow will have a long life so it's probably worth learning. Here is the manual.
- pyro: This is a relatively
new language that was released by Uber. It sits on top
of pytorch. Pyro has a well written tutorial, though the
language doesn't seem as intuitive to me as Edward. The
advantage is that pytorch is more natural than tensorflow.

- PyMC3: I really
like this language. It seems fairly intuitive and easy to
translate models to code. Like pyro and Edward, it is built
on top of an optimization library, theano. Unfortunately,
theano is no longer being developed, so I worry that PyMC3
will die. (Hopefully, I'm wrong and someone will port it to
tensorflow.)

- OpenBUGS:
a very popular language before torch/tensorflow/theano came
along. The BUGS code is quite readable and maps closely to
notation we've used in class. However, the documentation is
not nearly as well put together as the the documentation for
Edward and PyMC3. Also, BUGS is missing the hooks to neural
nets that Edward, pyro, and PyMC3 have.

- Stan: a very general purpose statistical modeling language, and it interfaces well with python, as well as lots of other data analysis languages. Because it is so general purpose, it is also the most intricate and extensive language. The documentation is 600+ pages.

Perform either exact or approximate inference
to obtain answers to part III of Assignment 4. You solved
this inference problem exactly, and the answers should be P(G_{1}=2|X_{2}=50)
= 0.1054 and p(X_{3}=50|X_{2}=50) =
0.1024. If you're going to use Edward, I wasn't able to
get any of the sampling-based inference procedures
(Metropolis-Hastings, Gibbs, hybrid Monte Carlo) to work on
discrete RVs; however KLpq does seem to get a solution, as long
as you include the argument n_samples=100 or larger. Because
there aren't any good examples of discrete RVs in Edward, we
found this
implementation of the sprinkler/rain graphical model to
be helpful. Read the description of KLpq carefully: it does a
search over Gaussian RVs, so you need to constrain the variable
if you want it to be nonnegative or binary. We also found that
for estimating p(X_{3}=50|X_{2}=50), the
distribution needs to be initialized to be in the right
neighborhood.

If you get really stuck and can't get this example to run, implement the burglar alarm network from class and show some inference results. The burglar alarm should be a straightforward extension of the sprinkler/rain net. We will give a max of 80% credit for this model.

As I mentioned on Piazza, one student has had success with PyMC3 and the code produced was quite sensible and readable.

For Part II, we would like you to hand in your code, and the runs that produce the two answers.

If you get really stuck and can't get this example to run, implement the burglar alarm network from class and show some inference results. The burglar alarm should be a straightforward extension of the sprinkler/rain net. We will give a max of 80% credit for this model.

As I mentioned on Piazza, one student has had success with PyMC3 and the code produced was quite sensible and readable.

For Part II, we would like you to hand in your code, and the runs that produce the two answers.