CSCI 5832: Natural Language Processing

Spring 2008

Jim Martin

This course examines a range of issues related to getting computers to perform useful and interesting tasks involving human language. Among the issues to be discussed are syntactic and semantic analysis, discourse processing, knowledge representation, and machine learning. We'll be discussing these issues in the conttext of practical applications for information extraction, question answering and machine translation.

 

General Information

Basics

This class meets Tuesday and Thursday from 11:00 to 12:30 in a room 1B12 in the Engineering Center.

The best ways to contact me are in person and via email. Don't bother leaving me voicemail, I don't reliably listen to it.

Text and Readings

The required text for this class is draft version of Speech and Language Processing.  It will be available at the CU bookstore. I will assign additional journal and conference articles as needed.

Schedule

The schedule (under revision) for the course includes the topic for each lecture, the assigned readings, pointers to the lecture slides, and the assignments. The lectures for this class begin on January 15 and end on May 1.

Requirements

  • Homework: There will be around 4 or 5 assignments. Unless otherwise stated, all assignments are due in class, at the beginning of class, on their due date.
  • Exams: There will be 3 in class quizzes and a final. The final exam is scheduled for Monday, May 5 from 1:30 to 4:00. Don't make plans to leave town before the final.
  • Participation: Come to class prepared to discuss the day's material. This means that you should have completed the day's reading before class and should be prepared to participate in class discussions.

The homeworks, quizzes and final will each count as 30% of your grade; the remaining 10% will be based on participation.

CAETE Specific Policies

Due dates for remote CAETE students are one week past the normal campus due date. CAETE students are responsible for keeping current with the lectures; do not allow a backlog of lectures to develop. table.

Computer Related Stuff

This is a graduate computer science course. You need to be a proficient programmer to take this class.

All the programming assignments in this class will be done using Python. We'll be making some use of the NLTK Toolkit - a suite a python modules and applications for doing natural language processing. You should see about getting the latest Python installed on whatever platform you plan to use for this course.

Honor Code

The campus has an Honor Code. It includes the following pledge which will be placed on all your exams and you will need to include on your assignments:

On my honor, as a University of Colorado at Boulder student, I have neither given nor received unauthorized assistance on this work.

Except when I specify otherwise, the assignments in this class will be done individually.  You may certainly discuss the assignments with one another but the final product (program, paper, etc) must be yours alone.

In the past, the primary problem area for this class has involved unintentional plagiarism. Unintentional or not it can still get you an F in the course and/or kicked out of school.

News

3/11/2008

The long awaited next homework is now posted.

2/6/2008

The technology underlying the application described in this New York Times article is discussed in Chapter 22.

2/6/2008

A set of older quizzes is now available.

1/28/2008

There was an interesting article on the current state of speech recognition in yesterday's New York Times.

There's also an article on text analytics (the subject of Chapter 22) in today's Wall Street Journal in the technology section. You'll have to look at the hardcopy to see it (there are free copies in the Biz school lobby).

1/28/2008

The first assignment is due tommorrow.

1/18/2008

You can find the slides for the lectures here . I'll post the original powerpoint slides as well as a pdf version with 3 slides per page for printing and space for notes.

1/17/2008

The course reader (textbook draft chapters) is available in the bookstore.

1/14/2008

The bookstore is preparing the printed reader for this course. It should be ready by the end of the week.

Chapter 1 is still on-line.