home · mobile · calendar · defenses · 2007-2008 · 

Thesis Defense - Bethard

Finding Event, Temporal and Causal Structure in Text: A Machine Learning Approach
Computer Science PhD Candidate
10/31/2007
12:30pm-2:30pm

Humans often describe their experiences through the event, temporal and causal structures they perceive. These structures are often expressed in textual forms, for example in timelines, where text is summarized by aligning events with the times at which they occurred. These same sorts of temporal-causal structures are also useful for a variety of computational tasks, like summarization and question answering. However, to reason over such structures they must first be extracted from their textual representations and organized into a machine readable form.

This work demonstrates that various important parts of the event, temporal and causal structure of a text can be extracted automatically using machine learning methods. Events, which serve as the basic anchors of temporal and causal relations, can be extracted with F-measures in the 70s and 80s using a word-chunking approach. Temporal relations between adjacent events in some common syntactic constructions can be identified with almost 90% accuracy using pair-wise classification. Causal relations are much more difficult, but initial work suggests that even this task may become tractable to machine learning methods.

Analyses of the various tasks lead to several conclusions about how best to approach the automatic extraction of temporal-causal structure. Tasks with little linguistic motivation had low agreement between humans and low machine learning model performance. Tasks with clear annotation guidelines based on known linguistic constructions had much higher inter-annotator agreement and much better model performance. Thus, future progress will depend on careful task selection guided by linguistic knowledge about how event, temporal and causal relations are expressed in text.

Committee: James Martin, Professor (Chair)
Wayne Ward, Research Professor
Martha Palmer, Department of Linguistics
Daniel Jurafsky, Stanford University
Tamara Sumner, Associate Professor

This defense will be held at the Center for Spoken Language Research in the Center for Innovation and Creativity at 1777 Exposition Drive.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
webmaster@cs.colorado.edu
www.cs.colorado.edu
May 5, 2012 (14:20)
XHTML 1.0/CSS2
©2012