skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · colloquia · 2005-2006 · 

Colloquium - Nenkova

DLC 1B70

Content Selection and Rewrite for Generic Multi-Document Summarization of News
Columbia University

Two of the main qualities expected from an automatic summarizer are the ability to select useful and interesting content and to present it as fluent readable text. In my work, I address both problems of content selection and readability, and validate my approach through empirical analysis of results from previous NIST-run summarization evaluations.

Our analyses of past evaluations show that more progress has been achieved in the area of generic multi-document summarization than in single-document summarization. But what has allowed these good results? In this talk I will discuss two aspects of a summarizer that contribute to good performance in content selection -- compositional computation of importance and context sensitivity. I will present a summarizer that uses frequency as a sole feature for estimating sentence importance. With the addition of a simple model for context adjustment, the summarizer performs as well as the state-of-the-art systems and allows us to focus on readability issues.

Analyses of automatic summarizers' output also show that summarizers do poorly in the area of linguistic quality. One of the problematic aspects is the referential clarity of summaries. I will present my experiments in summary rewrite, which specifically aim at improving the clarity of references, by either dropping unnecessary or repetitive information, or including additional descriptive information. I will discuss how salience and familiarity of referents can be inferred automatically and used in the rewrite process, reproducing human choices with high accuracy.

Ani Nenkova is currently a postdoctoral fellow at Stanford University. She obtained her PhD degree from Columbia University, where she worked on multi-document summarization of news. She holds a BS/MS degree from Sofia University (Bulgaria) where she specialized in mathematical logic. She has been actively involved in the NIST-run Document Understanding Conference summarization evaluations as part of the Columbia team, as well as chairing a work group on linguistic quality in 2003/2004, and supervising and organizing the pyramid evaluation in DUC 2005. Her work in summarization combines semantic and discourse knowledge with statistical processing and has contributed to the better understanding of the process of summarization.

Hosted by James Martin.
The speaker is a candidate for a faculty position in the Department of Computer Science.

The Department holds colloquia throughout the Fall and Spring semesters. These colloquia, open to the public, are typically held on Thursday afternoons, but sometimes occur at other times as well. If you would like to receive email notification of upcoming colloquia, subscribe to our Colloquia Mailing List. If you would like to schedule a colloquium, see Colloquium Scheduling.

Sign language interpreters are available upon request. Please contact Stephanie Morris at least five days prior to the colloquium.

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:29)