home · mobile · calendar · defenses · 2010-2011 · 

Thesis Defense - Ogren

Coordination Resolution in Biomedical Texts
Computer Science PhD Candidate
12/6/2010
10:00am-12:00pm

One of the most difficult and least studied sources of structural ambiguity in text is syntactic coordination. Coordination resolution, for this dissertation, is the task of determining the correct conjuncts of the coordinating conjunctions "and" and "or" and is explored here for biomedical scientific literature. It is a challenging problem because conjunctions are highly promiscuous with respect to the kinds of words and phrases that they are willing to coordinate. For example, a conjunct may consist of a single word such as adjectives or a much longer verb phrase. The main contribution of this work is an efficient and accurate coordination resolution algorithm that outperforms the previous state-of-the-art on this task and a state-of-the-art syntactic parser when applied to this task. The algorithm uses binary classifiers to predict conjunct boundaries. One of the more interesting features that improved the performance of these classifiers leverages probabilities generated by a language model which is built using large quantities of readily available unlabeled data. The language model derived features exploit the intuition that sentences containing coordinating conjunctions can often be rephrased as two or more smaller sentences derived from the coordination structure. Candidate sentences corresponding to different possible coordination structures are generated and compared using the language model to help determine which coordination structure is best. Performance is further improved by first predicting the syntactic type of the coordination structure and using this type prediction to help train and classify conjunct boundaries. Finally, a system that integrates the new approach with a syntactic parser is shown to outperform either approach in isolation.

Committee: Lawrence Hunter, University of Colorado School of Medicine (Chair)
Martha Palmer, Department of Linguistics
James Martin, Professor
Wayne Ward, Research Professor
Rodney Nielsen, Boulder Language Technologies
Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
webmaster@cs.colorado.edu
www.cs.colorado.edu
May 5, 2012 (14:20)
XHTML 1.0/CSS2
©2012