Today, users can dial a number by simply speaking the name of the person they would like to contact, or find out a flight by calling a number and talking to a machine. But most people still haven’t used speech technology at all or use it only infrequently, and those who run into an automated call center often request an operator or simply hang up. In the future, speech technology will be a mainstream way of interaction with software, devices and services, but advances in the state-of-the-art will be needed if we are to achieve that vision. Some of the challenges include making speech recognizers more robust to background noise and varying contexts, and designing user interfaces that are efficient and natural.
In this talk, I will describe what those grand challenges are and some of the activities that we are engaged in at Microsoft to address them. To combat background noise, I will talk about a microphone technology that includes a sensor that captures the vibration of the user’s skin. I will also talk about modeling context through a semantic platform. Finally, I will describe our efforts in designing multimodal interfaces that are efficient and simplify recovery of speech recognition errors.
Alex Acero manages the speech group in Microsoft Research and also is an affiliate Professor of Electrical Engineering at the University of Washington. He received an engineering degree from the Universidad Politecnica de Madrid in 1985, a Master’s from Rice University in 1987, and a Ph.D. from Carnegie Mellon University in 1990, all in Electrical Engineering. Before joining Microsoft in 1994, he worked in the speech group of Apple Computer (Cupertino, CA) and managed the speech group at Telefonica R&D labs (Madrid, Spain). He holds 9 patents and is author of the books, “Spoken Language Processing”, and “Acoustical and Environmental Robustness in Automatic Speech Recognition”. Dr. Acero has written chapters in three edited books and over 90 conference and journal papers. He is a Fellow of IEEE, member of the board of governors of the IEEE Signal Processing Society, Associate Editor of IEEE Signal Processing Letters and Computer Speech and Language.