home · mobile · calendar · defenses · 2001-2002 · 

Thesis Defense - Schone

Toward Knowledge-Free Induction of Machine-Readable Dictionaries
Patrick Schone
Computer Science PhD Candidate

Machine-readable dictionaries (MRDs) have found uses in many natural language tasks. MRDs provide a core of linguistic knowledge which help make many of these tasks feasible and add to the "intelligent" appearance of algorithms. Current MRDs are almost always generated either by digitizing hardcopy dictionaries or by hand construction. Both methodologies require many person-years of painstaking effort. In this research, we seek knowledge-free and language-independent strategies for inducing components of MRDs. In particular, we concentrate on the tasks of inducing dictionary headwords (including multiword units), inflectional morphologies, part of speech clusters and labels, and rudimentary semantic components. However, unlike past research efforts, our algorithms make use of no language-specific information and most, in fact, use no human input whatsoever. We also apply these algorithms to texts from various languages and report performance on appropriate gold standards.

Committee: James Martin, Associate Professor (Co-chair)
Daniel Jurafsky, Associate Professor (Co-chair)
Elizabeth Bradley, Associate Professor
Thomas Landauer, Department of Psychology
Charles Wooters, International Computer Science Institute
Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)