skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · thesis defenses · 2001-2002 · 
 

Thesis Defense - Schone

 
11/19/2001
9:00am-11:00am
ECOT 831

Toward Knowledge-Free Induction of Machine-Readable Dictionaries
Patrick J. Schone
Computer Science PhD Candidate

Machine-readable dictionaries (MRDs) have found uses in many natural language tasks. MRDs provide a core of linguistic knowledge which help make many of these tasks feasible and add to the "intelligent" appearance of algorithms. Current MRDs are almost always generated either by digitizing hardcopy dictionaries or by hand construction. Both methodologies require many person-years of painstaking effort. In this research, we seek knowledge-free and language-independent strategies for inducing components of MRDs. In particular, we concentrate on the tasks of inducing dictionary headwords (including multiword units), inflectional morphologies, part of speech clusters and labels, and rudimentary semantic components. However, unlike past research efforts, our algorithms make use of no language-specific information and most, in fact, use no human input whatsoever. We also apply these algorithms to texts from various languages and report performance on appropriate gold standards.

Committee: James Martin, Associate Professor (Co-chair)
Daniel Jurafsky, Associate Professor (Co-chair)
Elizabeth Bradley, Associate Professor
Thomas Landauer, Department of Psychology
Charles Wooters, International Computer Science Institute

 
See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Questions/Comments?
Send email to

Engineering Center Office Tower
ECOT 717
+1-303-492-7514
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:40)
 
.