Assignment 1
Probabilistic Models of
Human and Machine Intelligence

CSCI 7222
Fall 2013

Assigned 8/27/13
Due 9/3/13

Goal

The goal of this assignment is to give you a bit of practice manipulating data, using Bayes' rule, and constructing a naive Bayes classifier.

Data Set

The titanic data set gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are social class (first class, second class, third class, crew member), age (adult or child), gender, and whether or not the person survived. The titanic data set is available here.

Task 1

Build a probability table indicating P(Death | Gender, Age, Class) for each combination of class, age, and gender. Display this table in the following way:

 

Male

Female

 

Child

Adult

Child

Adult

First

 

 

 

 

Second

 

 

 

 

Third

 

 

 

 

Crew

 

 

 

 

The rows of each table represent the different classes and the columns the different ages and genders. In each cell of the table, insert the conditional probability. Warning: Be alert to the possibility of a cell containing no data.

After you’ve built the probability table, make a second table, a classification table, which predicts death or survival for each feature combination. If P(Death | Gender, Age, Class) > .5, then label that cell as death; otherwise label that cell as survival.

Task 2

Build a Naive Bayes classifier. To build the classifier, you must first construct six one-dimensional tables: p(Class | Death), p(Age | Death), p(Gender | Death), p(Class | Survive), p(Age | Survive), p(Gender | Survive). To be clear on this notation, for p(Age | Death), your table should have two rows, one for adult and one for child, and you should compute, for each age group, the probability of the deceased being in that age group. Also compute the unconditional probabilities, P(Death) and P(Survival), with P(Death)+P(Survival)=1. From this information, compute P(Death | Gender, Age, Class) using the Naive Bayes assumption. In addition to the probability table, build the classification table as well.

Task 3

Compare the classification tables you built in Tasks 1 and 2.  How well do they match?

Which table would you recommend using for prediction in case of another disaster like the Titanic (assuming it occurred at the same time in history)?