home · mobile · calendar · defenses · 2007-2008 · 

Thesis Defense - Chen

Unsupervised Named-Entity Disambiguation with Applications to Question Answering
Computer Science PhD Candidate

As the sheer amount of information in contemporary society expands at an even more rapid pace, the named-entity ambiguity problem becomes more and more serious in many fields, such as information integration, cross-document co-reference and question answering. Individuals are so glutted with information, that searching for data presents real problems. It is therefore crucial to develop methodologies that can efficiently disambiguate the ambiguous names from any given set of data.

In this thesis, the name ambiguity problem is explored for Question Answering (QA) systems. In QA, for each given question, the search for the answer in the large corpus is made on a time-cost basis. The named-entity disambiguation system can be a component of the QA system and can limit the search scope, thus cutting time.

This dissertation limits the work to personal name disambiguation, and mainly focuses on the feature extraction of a personal name disambiguation system. Two different personal name disambiguation systems have been developed: one extracts the features with the help of supervised NLP tools, and the other chooses unsupervised features. Both systems try to extract state-of-the-art features with different NLP tools and from broad resources, and perform very well in a news corpus or a web corpus.

Committee: James Martin, Professor (Chair)
Martha Palmer, Department of Linguistics
Wayne Ward, Center for Computational Language and Education Research (CLEAR)
Gregory Grudic, Assistant Professor
Kadri Hacioglu, Rosetta Stone Labs Boulder

This defense will be held at the Center for Computational Language and Education Research in the Center for Innovation and Creativity at 1777 Exposition Drive.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)