Rwhite  esearch into how humans produce and perceive speech carries the potential for handsome technological payoffs, ranging from machines that recognize, synthesize and compress speech to smaller, cheaper and more effective hearing aids. But speech- processing investigators have looked toward these goals for decades, with mixed results. "There's still a lot to be done," says Abeer Alwan of UCLA's Department of Electrical Engineering.

Nevertheless, Alwan, who came to UCLA in 1992 to establish the Speech Processing & Auditory Perception Laboratory, has begun to make headway by adopting new approaches to old problems.

Most speech perception studies, for instance, have taken place in a quiet laboratory setting. In a more natural environment—one that includes even a small amount of background noise—speech recognition devices often perform poorly. Similarly, hearing-aid users know all too well the difficulties encountered when competing sounds intrude.

"Exactly what enables us to understand speech in the presence of noise is still an open question," says Alwan. A primary focus of her work is to seek answers to that and other challenging questions through the development of quantitative models of human speech production and perception mechanisms.

Speech is degraded both by the sounds of other voices (the so-called cocktail-party effect) and by nonspeech signals, such as car noise. With a Research Initiation Award from the National Science Foundation and a FIRST Career Development Award from the National Institutes of Health, Alwan is developing algorithms for speech recognition and compression that factor for the presence of noise. To do this, she is exploring the auditory system's properties.

The reason our ears respond differently to the slam of a door when it is quiet as opposed to when it is noisy is that the auditory system adapts, Alwan notes. Similarly, humans are more sensitive to a speech signal's peak frequencies than to its valleys; a realistic speech perception system, then, would accent the peaks.

Alwan and her students have quantified such auditory mechanisms mathematically, and are now applying auditory models to two perceptually based systems: a variable-rate speech and audio coder, which is particularly suitable

 

 for wireless transmission, and a word-recognition system. By using properties of the auditory system, Alwan's group has been able to optimize the performance of these systems in naturally noisy environments.

On the speech production side, Alwan is collaborating with researchers at Cedars-Sinai Medical Center and AT&T Labs on a project that uses magnetic resonance imaging (MRI) of the vocal tract to construct physiologically based three-dimensional computer models for how speech is produced. The work is supported by a CAREER Award from the National Science Foundation.

Most of the previous studies on vocal tract geometry during speech relied on lateral X-ray data, but such studies are limited not only by the inherent radiation risks, but also by the difficulty in pinpointing the cross-sectional morphology from midsagittal profiles. "MRI has eliminated the guesswork," Alwan notes. While a handful of small MRI studies had been done previously, her group has amassed what is likely the world's largest database of MRI images for these purposes, covering the gamut of speech sounds. "This is a noninvasive way to obtain important speech production information," Alwan asserts.

The images have revealed new evidence of inter-speaker similarities and differences in tongue shapes during speech, with direct acoustic consequences. "We have started to see differences that have never been documented in the way people make certain sounds," Alwan says. Using the geometries involved from the unified set of measurements of each sound, Alwan has honed a mathematical model for sound generation that can be exploited in speech synthesizers. Potentially, Alwan believes the same data could be used to assist speech pathologists. "Right now, when they train a person to say a certain sound, they only have an idea of what the tongue is supposed to do," she notes.

Alwan suspects that as more is learned about the human brain, the applications of her work will expand dramatically. "We all hope for the day when we can understand exactly how the neurons are firing in response to given speech sounds, and what the brain does to decode the information," she says. "Once we know how that's done, we'll be able to apply that knowledge to speech-processing systems, but more importantly, it will help us to understand certain speech and hearing impairments."

—Dan Gordon

ALWAN

Noises
Off
________

The key to auditory
perception may
ultimately lie not in
how we hear,
but how we speak

   

Home 1998