home · mobile · calendar · colloquia · 1996-1997 · 

Colloquium - Mozer

Neural Network Speech Processing for Toys and Consumer Electronics
Department of Computer Science and Sensory Circuits, Inc.

The ongoing challenge in speech research is recognizing continuous, unconstrained speech. In comparison, isolated word recognition with small vocabularies is easy. Many commercial efforts are aimed at the high-end problem. Sensory Circuits has successfully focused on the low end, producing a family of low-cost speech recognition chips for toys, consumer electronics, electronic learning aids, and home appliances.

The chips are based on a 4 MIPS 8-bit microcontroller with on board AGC, A/D, D/A, and digital filtering. The microcontroller can be programmed for speaker-independent or dependent recognition, voice verification (recognizing a stored password spoken by particular speaker), polyphonic music synthesis, speech synthesis, voice record and playback, and has enough power to drive and communicate with the application product.

The speech recognition and voice verification products are neural-network based. Speaker-independent recognition of up-to-10 word vocabularies achieves accuracies of 95-98%. Speaker-dependent recognition of vocabularies of up-to-20 items has an accuracy greater than 99%. The chip can be programmed to handle larger vocabularies by context-dependent switching of recognition sets.

The neural net architectures are fairly standard, but the hardware and real-world usage impose some interesting challenges, including speed constraints which necessitate integer arithmetic, very limited RAM, and on-line speaker adaptation. I will demo various products with neural network speech recognition.

Refreshments will be served immediately before the talk at 3:30pm.
Hosted by Dirk Grunwald.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)