home · mobile · calendar · colloquia · 2005-2006 · 

Colloquium - Marcus

Automatically Parsing English -- At Last ...
Mitchell Marcus
University of Pennsylvania

For over a decade, computational linguists have competed aggressively to see who could build the best performing parser for English, using ever-improving statistical machine learning methods. Most of these parsers trained and tested against the Penn Treebank, a corpus of syntactic annotation developed by the speaker, yet none until recently attempted to recover the full structure of the annotated corpus. Unfortunately, the syntactic structures ignored are crucial for standard views on how semantic structure is decoded from syntax.

This talk will review why this gap occurred, and describe a full Treebank parser constructed recently at the University of Pennsylvania. This parser recovers both the semantic and syntactic function tags included in the Treebank, as well as recovering all significant null elements (including both passive and WH-traces) in the original annotation. It demonstrates that a sensitivity to linguistic issues, combined with well-understood current machine-learning methods, yields state-of-the-art performance.

This talk describes joint work with Ryan Gabbard and Seth Kulick.

Sponsored by the Institute of Cognitive Science.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)