|
Department of Computer Science
|
University of Colorado Boulder
|
|
|
|
|
|
|
|
|
home · events · thesis defenses · 2010-2011 ·
|
| |
Thesis Defense - Ogren |
| |
12/6/2010 10:00am-12:00pm CINC 102
|
Coordination Resolution in Biomedical Texts
Computer Science PhD Candidate
One of the most difficult and least studied sources of structural ambiguity in
text is syntactic coordination. Coordination resolution, for this dissertation,
is the task of determining the correct conjuncts of the coordinating
conjunctions "and" and "or" and is explored here for biomedical scientific
literature. It is a challenging problem because conjunctions are highly
promiscuous with respect to the kinds of words and phrases that they are
willing to coordinate. For example, a conjunct may consist of a single word
such as adjectives or a much longer verb phrase. The main contribution of this
work is an efficient and accurate coordination resolution algorithm that
outperforms the previous state-of-the-art on this task and a state-of-the-art
syntactic parser when applied to this task. The algorithm uses binary
classifiers to predict conjunct boundaries. One of the more interesting
features that improved the performance of these classifiers leverages
probabilities generated by a language model which is built using large
quantities of readily available unlabeled data. The language model derived
features exploit the intuition that sentences containing coordinating
conjunctions can often be rephrased as two or more smaller sentences derived
from the coordination structure. Candidate sentences corresponding to different
possible coordination structures are generated and compared using the language
model to help determine which coordination structure is best. Performance is
further improved by first predicting the syntactic type of the coordination
structure and using this type prediction to help train and classify conjunct
boundaries. Finally, a system that integrates the new approach with a syntactic
parser is shown to outperform either approach in isolation.
|
|
|
|
|
|
|
|
|
| |