home · mobile · calendar · colloquia · 2005-2006 · 

Colloquium - Nenkova

Content Selection and Rewrite for Generic Multi-Document Summarization of News
Columbia University

Two of the main qualities expected from an automatic summarizer are the ability to select useful and interesting content and to present it as fluent readable text. In my work, I address both problems of content selection and readability, and validate my approach through empirical analysis of results from previous NIST-run summarization evaluations.

Our analyses of past evaluations show that more progress has been achieved in the area of generic multi-document summarization than in single-document summarization. But what has allowed these good results? In this talk I will discuss two aspects of a summarizer that contribute to good performance in content selection -- compositional computation of importance and context sensitivity. I will present a summarizer that uses frequency as a sole feature for estimating sentence importance. With the addition of a simple model for context adjustment, the summarizer performs as well as the state-of-the-art systems and allows us to focus on readability issues.

Analyses of automatic summarizers' output also show that summarizers do poorly in the area of linguistic quality. One of the problematic aspects is the referential clarity of summaries. I will present my experiments in summary rewrite, which specifically aim at improving the clarity of references, by either dropping unnecessary or repetitive information, or including additional descriptive information. I will discuss how salience and familiarity of referents can be inferred automatically and used in the rewrite process, reproducing human choices with high accuracy.

Ani Nenkova is currently a postdoctoral fellow at Stanford University. She obtained her PhD degree from Columbia University, where she worked on multi-document summarization of news. She holds a BS/MS degree from Sofia University (Bulgaria) where she specialized in mathematical logic. She has been actively involved in the NIST-run Document Understanding Conference summarization evaluations as part of the Columbia team, as well as chairing a work group on linguistic quality in 2003/2004, and supervising and organizing the pyramid evaluation in DUC 2005. Her work in summarization combines semantic and discourse knowledge with statistical processing and has contributed to the better understanding of the process of summarization.

Hosted by James Martin.
The speaker is a candidate for a faculty position in the Department of Computer Science.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)