4/17/2006 3:30pm-4:30pm DLC 1B70
|
Content Selection and Rewrite for Generic Multi-Document Summarization of News
Columbia University
Two of the main qualities expected from an automatic summarizer are the ability
to select useful and interesting content and to present it as fluent readable
text. In my work, I address both problems of content selection and readability,
and validate my approach through empirical analysis of results from previous
NIST-run summarization evaluations.
Our analyses of past evaluations show that more progress has been achieved in
the area of generic multi-document summarization than in single-document
summarization. But what has allowed these good results? In this talk I will
discuss two aspects of a summarizer that contribute to good performance in
content selection -- compositional computation of importance and context
sensitivity. I will present a summarizer that uses frequency as a sole feature
for estimating sentence importance. With the addition of a simple model for
context adjustment, the summarizer performs as well as the state-of-the-art
systems and allows us to focus on readability issues.
Analyses of automatic summarizers' output also show that summarizers do poorly
in the area of linguistic quality. One of the problematic aspects is the
referential clarity of summaries. I will present my experiments in summary
rewrite, which specifically aim at improving the clarity of references, by
either dropping unnecessary or repetitive information, or including additional
descriptive information. I will discuss how salience and familiarity of
referents can be inferred automatically and used in the rewrite process,
reproducing human choices with high accuracy.
Ani Nenkova is currently a postdoctoral fellow at Stanford
University. She obtained her PhD degree from Columbia University, where she
worked on multi-document summarization of news. She holds a BS/MS degree from
Sofia University (Bulgaria) where she specialized in mathematical logic. She
has been actively involved in the NIST-run
Document Understanding Conference
summarization evaluations as part of the Columbia team, as well as chairing a
work group on linguistic quality in 2003/2004, and supervising and organizing
the pyramid evaluation in
DUC 2005. Her work in
summarization combines semantic and discourse knowledge with statistical
processing and has contributed to the better understanding of the process of
summarization.
Hosted by James Martin. The speaker is a candidate for a faculty position in the Department of Computer Science.
|
The Department holds colloquia throughout the Fall and Spring semesters. These
colloquia, open to the public, are typically held on Thursday afternoons, but
sometimes occur at other times as well.
If you would like to receive email notification of upcoming colloquia,
subscribe to our
Colloquia Mailing List.
If you would like to schedule a colloquium, see
Colloquium Scheduling.
Sign language interpreters are available upon request. Please contact
Stephanie Morris at least five days prior to the colloquium.
|