Class News

News about the class will appear here. Please subscribe to the feed since I'll use this more than the class email list.

Due Dates

Your projects and any other remaining coursework that you'd like to have counted should be to me by 5PM on December 13th.

Project Format

Use the ACM guidelines for conference submissions for your project reports. The report should be no longer than it needs to be to describe your work. And should not be longer than 8 pages including tables, figures and any references.  You're free to use either the Word or LaTex templates at the ACM site.


A new grade sheet is now up. It includes everything I have up until today.  The last column is the percent that you have of the possible scores.

HW1 was worth 50

HW2 was worth 100.  It's divided into two parts. You get 50 points if your part 1 solution worked.   The remaining 50 points are scaled relative to the best R-precision that was posted (.4147). 

HW 3 was the same: 50 points if it worked, plus 50 points prorated to the best F-score (.3988).

HW Total is your % of the total HW points (250).

Quiz 1 was worth 45, Quiz 2 was worth 60.  They're weighted equally.

The HW total and quiz total carry equal weight in the final column.

The average Total score is .825, with a std deviation of .12. 

Social Networks

This recent article is relevant to some of our upcoming material.

Sentiment Readings

For the "sentiment" readings please concentrate on pages 1-23 of the Opinion Mining and Sentiment Analysis text.  Basically up through Part 1 of Chapter 4 (Classification and Extraction).


The grades I have so far are posted.  The best R-precision on HW 2 was .4147 (average around .33)  and the best F1 score on HW 3 is .3988 (average around .32).

Your scores for HW 2 will be your points for Part 1 plus (YourRPrecision/.4147)*50.    For HW3, if you handed something in that works, your score will be 50 + (YourF1/.3988)*50.

If you have a 0 anywhere and you sent me that HW then it probably means your input was badly formed and you broke the grading script.  Send me mail and I'll tell you what it looks like.

Topic Models

I've posted the reading for topic models.

HW 3 Test Set

The test set for HW 3 has been posted.

HW 3

The details of HW 3 have been posted on the assignments page.

HW 2 Part 2

Details of the second part of the current HW are on the assignments page.


See the schedule page for the course video link.

HW 2 Part 1: Deliverables

For this assignment I want two things sent to me by email: a short description of what you did/discovered with respect to Lucene (your R-precision, what you indexed, what analyzer you used, etc),  and a second attachment with the query/doc id results as described on the assignment page.  Name the attachments as follows:



I don't need your code for this one.  And I really don't need the indexed documents either.

Old Quizzes

Quizzes from past semesters are available to help you prepare for the quiz.

Second Assignment

Details of the second homework are posted on the assignments page.


For those of you who are Unix geeks (or want to be).  Here's why 14480 is the right answer.

grep -v '^.I' med.all | grep -v '^.W' | tr -cs "[:alnum:]-" "\n" | sort | uniq | wc -l

Assignment 1 Terms

Please email me the postings associated with the following terms in your index.








You can retrieve the powerpoint or pdf slides from each class here.

Assignment 1

The details of assignment have been posted.


We'll be trying out a new Q/A website called Piazza to handle your class discussions concerning any aspect of the class. I encourage you to post questions there rather than sending me email. You'll need to create an account and enroll in the class to use the system.  Just visit the Piazza class site to get started.


Welcome to the fall semester. This is your source of information for CSCI 5417. Please add this to your RSS feed so you can  keep up with class announcements.

© James H. Martin, 2011