skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · thesis defenses · 2010-2011 · 

Thesis Defense - Ogren

CINC 102

Coordination Resolution in Biomedical Texts
Computer Science PhD Candidate

One of the most difficult and least studied sources of structural ambiguity in text is syntactic coordination. Coordination resolution, for this dissertation, is the task of determining the correct conjuncts of the coordinating conjunctions "and" and "or" and is explored here for biomedical scientific literature. It is a challenging problem because conjunctions are highly promiscuous with respect to the kinds of words and phrases that they are willing to coordinate. For example, a conjunct may consist of a single word such as adjectives or a much longer verb phrase. The main contribution of this work is an efficient and accurate coordination resolution algorithm that outperforms the previous state-of-the-art on this task and a state-of-the-art syntactic parser when applied to this task. The algorithm uses binary classifiers to predict conjunct boundaries. One of the more interesting features that improved the performance of these classifiers leverages probabilities generated by a language model which is built using large quantities of readily available unlabeled data. The language model derived features exploit the intuition that sentences containing coordinating conjunctions can often be rephrased as two or more smaller sentences derived from the coordination structure. Candidate sentences corresponding to different possible coordination structures are generated and compared using the language model to help determine which coordination structure is best. Performance is further improved by first predicting the syntactic type of the coordination structure and using this type prediction to help train and classify conjunct boundaries. Finally, a system that integrates the new approach with a syntactic parser is shown to outperform either approach in isolation.

Committee: Lawrence Hunter, University of Colorado School of Medicine (Chair)
Martha Palmer, Department of Linguistics
James Martin, Professor
Wayne Ward, Research Professor
Rodney Nielsen, Boulder Language Technologies

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:40)