Assignments

Assignments will be posted here as we go along. Programming assignments should be completed using the Python programming language.

Assignment 1

Part 1: Sentence segmentation, due by the beginning of class  2/10/2009

Part 2: Exercises 3.4 and 3.10., hardcopy due by the beginning of class 2/5/2009

Extra credit: Exercise 3.11


Assignment 2

Part 1:  Exercises 6.1 and 6.2. Due 4/30. 

Basically implement the Forward and Viterbi algorithms for the Jason Eisner ice cream eating behavior from Chapter 6.

EMail me your answers to the specific questions given in the exercises as well as your source code (as a single python or tar file).

Extra Credit: Due May 6, 2009.

Part 1: 25 Points. Sentiment analysis for movie reviews.

Implement a naive Bayes classifier for up/down movie reviews.  For this part you should use a simple all-words approach to classification.  For data, use the polarity dataset v2.o found here.

You should perform a 10-fold cross-validation to report your results.

The application of naive Bayes to this task follows along the same lines as the application to WSD.  For a detailed description of NB classifiers as applied to text classification see Manning et al's new IR textbook.

Part 2: 25 Points.  Improved sentiment analysis.

Find a way to improve the performance of your system.  This part is entirely up to you.  The point is to get a score better than the basic NB approach. You can stick with NB and improve the feature set, or dump NB and use a better machine learning model (or both).   You might read some of the articles pointed to on the data site for inspiration.

For both parts, email me a short writeup with your results, along with your code.