skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · thesis defenses · 2003-2004 · 

Thesis Defense - Coccaro

RL6 C181

Latent Semantic Analysis as a Tool to Improve Automatic Speech Recognition Performance
Noah B. Coccaro
Computer Science PhD Candidate

This thesis explores the use of Latent Semantic Analysis to augment an N-gram language model to improve the accuracy of a large vocabulary speech recognition system. This thesis discusses possible solutions to three problems presented when integrating LSA with an N-gram model.

First, two approaches to deriving a probability from a semantic distance are examined. Numerous parameters are introduced and optimal values found. Second, because the N-gram and LSA model have different strengths, it is necessary to develop confidence metrics that indicate when to rely more strongly on a particular model. Several confidence metrics are developed and used. Lastly, the problem of combining the two probability models is explored. Several different approaches to combining the models, including geometric mean and a decision tree were evaluated. Experimental results compared to a standard trigram model showed a reduction in perplexity of approximately 14%, and a significant reduction in the word error rate of a speech recognizer by 0.5%, 3.0% relative.

Committee: James Martin, Associate Professor (Chair)
Daniel Jurafsky, Stanford University
Clayton Lewis, Professor
Wayne Ward, Center for Spoken Language Research
Thomas Landauer, Department of Psychology

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:40)