UK HomeAcademicsAthleticsMedical CenterResearchSite IndexSearch UK


Kevin Donohue
Enhancing the Human-Computer Audio Interface

by Jeff Worley & Robin Roenker

Minutes before a crucial presentation, a dozen colleagues pace around a conference room making final adjustments to their design. They call up spreadsheets, graphic elements, and other data from their computer—but without a single tap at a keyboard or click of the mouse. The room itself is their link to their computer files, and with only a spoken command, the requested information appears.

Photo of Kevin DonohueIt may sound like something out of Star Trek, but so-called smart rooms—capable of recognizing human gestures and voice commands—will enable such a seamless human-computer interface, and such environments are not-so-distant reality.

While the majority of his colleagues at UK's Center for Visualization and Virtual Environments work on the visual aspects of smart-room technology, Kevin Donohue, associate professor of electrical and computer engineering, focuses on the audio component.

His research centers on microphone arrays, groups of synchronized microphones distributed within a room. Microphone arrays can record sound just like traditional single-element microphones, but unlike traditional mics, they also can record relative travel time from the sound source. Thus, they can distinguish where the sound is coming from within a room and distinguish between sounds from different sources.

Consequently, microphone arrays could eventually allow computers in smart-room applications to differentiate between the voice commands of different speakers—allowing each individual access to only his or her own files.

What's more, microphone arrays can follow sound sources, effectively providing a dedicated mobile microphone for each speaker in the room. The vocal amplification, in turn, increases the accuracy of the computer's word and voice recognition capabilities, which require high-quality signals for good performance.

"Arrays can definitely result in a much clearer signal, which will certainly help with voice recognition and word recognition," says Donohue. "If we focus the array on the speaker, it has the ability to attenuate much of the unnecessary background noise and really clean up the sound."

So in our example of the dozen workers moving about the room shouting commands simultaneously, a computer equipped with microphone arrays wouldn't miss a beat.

Still in developmental stages, Donohue's research to date has utilized an eight-element microphone array. He hopes to eventually expand that to 1,000 microphones, a move that will necessitate employment of a 32-node parallelized computer cluster to process the hundreds of thousands of audio samples being taken in by the microphones each second.

It's this aspect of the research—utilizing computer clusters—that’s new to the field, says Donohue. "These clusters are going to give us a very flexible test bed, as opposed to what people did in the 1990s, where their computing power was hardwired into the system. They had power, but limited flexibility. By interfacing arrays and clusters, we will have both the flexibility and the computing power we are looking for."

About Kevin Donohue

Beyond his work with audio arrays, Kevin Donohue's research interests include medical imaging and image quality analysis. In the future, he intends to collaborate with Christopher Jaynes in visual image processing to connect the audio-visual components of smart-room design.

Donohue Research Team