home · mobile · calendar · defenses · 2005-2006 · 

Thesis Defense - Hagen

Advances in Children's Speech Recognition with Application to Interactive Literacy Tutors
Andreas Hagen
Computer Science PhD Candidate

Speech technology offers great opportunity in the field of automated literacy and reading tutors for children. In such applications speech recognition can be used to track the reading position of the child, detect oral reading miscues, and even play an important role in assessing comprehension or engaging the child in interactive dialogs for learning. Despite such promises, speech recognition systems exhibit higher error rates for children due to variabilities in vocal tract length, formant frequency, pronunciation, and grammar. In the context of recognizing speech while children are reading out loud, these problems are compounded by speech production behaviors affected by difficulties in recognizing words that cause pauses, repeated syllables, and other phenomena.

This thesis presents advances in speech recognition that improve accuracy and modeling capability in the context of an interactive literacy tutor for children. This thesis presents a novel set of speech recognition techniques which can be applied to improve oral reading tracking. First, it is demonstrated that speech recognition error rates for interactive read-aloud can be reduced by more than 45% through a combination of advances in both statistical language and acoustic modeling. Next, this thesis proposes extending the baseline system by introducing a novel token-passing search architecture targeting subword-unit-based speech recognition. The proposed subword-unit-based speech recognition framework is shown to provide equivalent accuracy to a whole-word-based speech recognizer while enabling detection of oral reading events and finer grained speech analysis during recognition.

The efficacy of the approach is demonstrated using data collected from children in 3rd through 5th grade; namely 39.4% of partial words with reasonable evidence in the speech signal are detected at a low false alarm rate of 0.9%. Subword-unit-based speech recognition is extended to a large vocabulary task and its advantages for tight search beams is demonstrated when compared to word-based recognition. Finally, subword units are shown to represent a valuable pool of potential distractors in the language modeling part of pronunciation verification tasks.

Committee: Bryan Pellom, Research Assistant Professor (Chair)
Ronald Cole, Research Professor
Kadri Hacioglu, Center for Spoken Language Research
James Martin, Associate Professor
Wayne Ward, Research Professor
Barbara Wise, Center for Spoken Language Research

This defense will be held at the Center for Spoken Language Research in the Center for Innovation and Creativity at 1777 Exposition Drive.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)