CSCI 7000, Sec 002: Modern Information Retrieval

Fall 2007

This class will cover a range of topics that broadly come under the heading of text-based information retrieval. We'll begin with the basic techniques used to index and query large collections of documents. From there we'll go on to study more advanced topics including web-crawling, graph-based retrieval algorithms such as PageRank, digital libraries, document categorization/clustering, text-based social network analysis as well as sentiment analysis.

Nothing beyond a typical undergraduate computer science background is required for this course; familiarity with natural language processing, machine learning, and numerical linear algebra will be helpful.


General Information 

Instructor

Jim Martin

Basics

This class meets Mondays and Wednesdays from 4 to 5:15 in room ECCR 139 in the Engineering Center.

My office hours are Tuesday from 2 to 3:30 and Wednesday from 10 to 11:30 in ECOT 735. The best way to communicate with me is to talk to me in person, the next best way is email, and last is voicemail (which I don't reliably use).

Text and Readings 

The required textbook for this class is Introduction to Information Retrieval by C. Manning, P. Raghavan, and H. Schütze.

We'll also be reading a variety of journal articles and conference papers. These will be made available online. A list will be made available later in the summer.

Schedule

A tentative schedule will be posted sometime real soon.

Requirements

The formal requirements of the course include three programming assignments, several written problem sets, two quizzes and a final group project. I also expect everyone to come to class prepared (do the required readings and think about the material) and to participate in class.

Computer Related Stuff 

This is a graduate class in computer science so you should be a proficient programmer to take this class. We will be making use of the open source, java-based Lucene text indexing and retrieval library.

News

9/11/2007

The first assignment has been posted.

4/12/2007

This page posted. The schedule of topics and readings will be added soon.