12/18/2007 9:00am-11:00am CINC 102
|
Unsupervised Named-Entity Disambiguation with Applications to Question Answering
Computer Science PhD Candidate
As the sheer amount of information in contemporary society expands at an even
more rapid pace, the named-entity ambiguity problem becomes more and more
serious in many fields, such as information integration, cross-document
co-reference and question answering. Individuals are so glutted with
information, that searching for data presents real problems. It is therefore
crucial to develop methodologies that can efficiently disambiguate the
ambiguous names from any given set of data.
In this thesis, the name ambiguity problem is explored for Question Answering
(QA) systems. In QA, for each given question, the search for the answer in the
large corpus is made on a time-cost basis. The named-entity disambiguation
system can be a component of the QA system and can limit the search scope,
thus cutting time.
This dissertation limits the work to personal name disambiguation, and mainly
focuses on the feature extraction of a personal name disambiguation system.
Two different personal name disambiguation systems have been developed: one
extracts the features with the help of supervised NLP tools, and the other
chooses unsupervised features. Both systems try to extract state-of-the-art
features with different NLP tools and from broad resources, and perform very
well in a news corpus or a web corpus.
| Committee: |
James Martin, Professor (Chair)
Martha Palmer, Department of Linguistics
Wayne Ward, Center for Computational Language and Education Research (CLEAR)
Gregory Grudic, Assistant Professor
Kadri Hacioglu, Rosetta Stone Labs Boulder
|
This defense will be held at the Center for Computational Language and Education Research in the Center for Innovation and Creativity at 1777 Exposition Drive.
|