Deep learning, or hierarchical learning, is an emerging area in neural information processing, machine learning, and artificial intelligence where many layers of nonlinear information processing stages in brain-like hierarchical architectures are exploited for pattern classification and/or unsupervised feature learning. At MSR, we have pioneered the development of a set of deep architectures and related learning techniques, and have successfully applied them to drastically improve the quality of phone recognition, voice search, spontaneous speech recognition, intent determination via semantic utterance classification (i.e., speech understanding), speech feature coding, and hand-writing recognition, often beating state-of-the-art methods with surprise. Other researchers (e.g., from Google Research, IBM Research, Univ. of Toronto, Stanford Univ., NYU, UW, U. Montreal, etc.) have also demonstrated the success of deep learning in phonetic and speech recognition, audio processing, NLP, language modeling, computer vision, information retrieval, and robotics. This talk will review both theory and applications of deep learning, and analyze its future directions. To organize the review material, I develop a classificatory scheme to examine and summarize major work reported in the literature. Using this scheme, a taxonomy-oriented overview will be provided on the existing deep architectures and algorithms, which are categorized into three classes: generative, discriminative, and hybrid. One or two successful applications from each of these classes will be described in some detail, focusing on the recent work carried out at MSR.
Li Deng received the Ph.D. from Univ. Wisconsin-Madison. He was an Assistant, Associate, and Full Professor at Univ. Waterloo, Canada 1989-1999. He then joined Microsoft Research, Redmond, where he is currently Principal Researcher and where he receives Microsoft Research Technology Transfer and Achievement Awards. Prior to MSR, he also worked or taught at Massachusetts Institute of Technology, ATR Interpreting Telecom. Research Lab. (Kyoto, Japan), and HKUST. He has published over 300 refereed papers in leading journals/conferences and 3 books. He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the International Speech Communication Association. He is an inventor or co-inventor of over 50 granted US, Japanese, or international patents. He served on the Board of Governors of the IEEE Sig. Proc. Soc. (2008-2010). More recently, he served as Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011), which, according to the Thompson Reuters Journal Citation Report released 2010 and 2011, ranks first in both years among all 127 IEEE publications and all 247 publications within the Electrical and Electronics Engineering Category worldwide in terms of its impact factor, and for which he receives the IEEE SPS Meritorious Service Award. He currently serves as Editor-in-Chief for IEEE Transactions on Audio, Speech and Language Processing.