Abstract
Speech separation, popularly known as the cocktail party problem, is a widely acknowledged challenge in speech and signal processing. Motivated by advances in speech perception and computational auditory scene analysis, we have suggested a new formulation to this problem that classifies time-frequency units into two classes: those dominated by the target speech and the rest. Recent separation algorithms that adopt this supervised classification formulation show considerable promise for solving the speech separation problem. In supervised learning, a paramount issue is generalization to conditions unseen during training. This presentation describes novel methods to deal with the generalization issue where support vector machines (SVMs) are used to estimate the ideal binary mask (IBM). One method employs distribution fitting to adapt to unseen signal-to-noise ratios and iterative voice activity detection to adapt to unseen noises. Another method learns more linearly separable features using deep neural networks (DNNs) and then couples DNN and linear SVM for training on a variety of noisy conditions. Systematic evaluations show high quality IBM estimation in new acoustic environments.
Biography
DeLiang Wang received the B.S. degree in 1983 and the M.S. degree in 1986 from Peking (Beijing) University, Beijing, China, and the Ph.D. degree in 1991 from the University of Southern California, Los Angeles, CA, all in computer science.
From July 1986 to December 1987 he was with the Institute of Computing Technology, Academia Sinica, Beijing. Since 1991, he has been with the Department of Computer Science and Engineering and the Center for Cognitive Science at The Ohio State University, Columbus, OH, where he is currently a Professor. From October 1998 to September 1999, he was a visiting scholar in the Department of Psychology at Harvard University, Cambridge, MA. From October 2006 to June 2007, he was a visiting scholar at Oticon A/S, Copenhagen, Denmark.<p
DeLiang Wang received the NSF Research Initiation Award in 1992 and the ONR Young Investigator Award in 1996. He received the OSU College of Engineering Lumley Research Award in 1996, 2000, 2005, and 2010. His 2005 paper, “The time dimension for scene analysis”, received the IEEE Transactions on Neural Networks Outstanding Paper Award from the IEEE Computational Intelligence Society. He also received the 2008 Helmholtz Award from the International Neural Network Society. He was an IEEE Distinguished Lecturer (2010-2012), and is an IEEE Fellow.<p
He is Co-Editor-In-Chief of Neural Networks, which is a premier journal published by Elsevier. In addition, he serves on the editorial/advisory boards of Cognitive Computation, Cognitive Neurodynamics, Neural Computing and Applications, EURASIP Journal on Audio, Speech, & Music Processing, and IEEE Transactions on Audio, Speech, & Language Processing. He served as President of the International Neural Network Society in 2006, and currently serves on its governing board.


