14:00 - 15:00
The Switchboard-1 Telephone Speech Corpus was originally collected by Texas Instruments in 1990-91, under DARPA sponsorship, and marked the beginning of over 25 years of intensive effort in conversational speech recognition. Recently, we have measured the ability of professional transcribers to transcribe this sort of data, and found that our latest systems have achieved the same level of performance. In this talk, I will describe the key technological advances that have made this possible: the systematic use of CNN and LSTM acoustic models in both acoustic and language modeling, as well as the extensive use of system combination. The talk will also provide an analysis of the errors made by people and computers, which show substantially similar error patterns, with the exception of confusions between backchannel acknowledgments and hesitations.
Geoffrey Zweig is a Partner Research Manager at Microsoft, in charge of the Speech & Dialog Research Group, and responsible for advancing the state-of the-art in speech recognition and dialog systems. Recent work includes the development of the Custom Recognition Intelligent Service (CRIS) and the Language Understanding Intelligent Service (LUIS) in Microsoft’s Cognitive Services suite, as well as ground-breaking performance on the conversational speech recognition task. Supporting this work is Microsoft’s CNTK neural net toolkit, developed by researchers on Dr. Zweig’s team.
Prior to joining Microsoft in 2006, Dr. Zweig worked at IBM Research for eight years, most recently as the manager of the Advanced LVSCR research group, where he led a team of researchers to develop English, Arabic and Mandarin speech recognition systems for the DARPA EARS and GALE programs. Dr. Zweig received his PhD in 1998 from the Computer Science Department of the University of California at Berkeley. He has served as Associate Editor of the IEEE-TASLP and is currently on the editorial board of CSL, and is an IEEE Fellow. Dr. Zweig has published over 80 papers and holds numerous patents for his work.
14:00 - 15:00
In the last decade we have witnessed machine learning trigger a revolution in dialogue research. Using a variety of reinforcement and supervised learning methods and innovative architectures, we can now build fully data-driven dialogue systems. These techniques are part of a user-in-the-loop framework, where systems can be deployed quickly, handle speech recognition errors gracefully and learn continuously from interaction with real users.
Methods based on Gaussian processes are particularly effective as they enable good models to be estimated from limited training data. Furthermore, they provide an explicit estimate of uncertainty, which is particularly useful for reinforcement learning. This talk explores the additional steps that are necessary to extend these methods to support adaptation to different dialogue domains, an important step for scaling up and building evolving systems.
The final part of the talk will focus on the evolution of the next generation of spoken dialogue systems. These systems will need to operate on large and dynamic domains and, more importantly, be capable of conducting rich and natural interaction. We will present a research roadmap towards this goal. A typical application where such a level of complexity is needed is a mental health application that we will discuss to illustrate the need for such research.
Milica Gašić graduated in Computer Science and Mathematics from the University of Belgrade in 2006. Since then, she has been at the University of Cambridge. After completing an MPhil course in Computer Speech, Text and Internet Technology at the Computer Laboratory, she enrolled as a PhD student in Statistical Dialogue Modelling under the supervision of Professor Steve Young at the Engineering Department. In 2011, she was awarded an EPSRC PhD plus award for her dissertation and she became a Research Associate in the Dialogue Systems Group. She was a technical area leader for the Parlance project, an EU FP7 project from 2012 to 2014. In 2014, she was elected a Research Fellow of Murray Edwards College, University of Cambridge. In April 2016, she was appointed a Lecturer in Spoken Dialogue Systems at the Department of Engineering, University of Cambridge. She has published around 50 journal articles and peer reviewed conference papers. She has received a number of best paper awards: CSL (2010), Interspeech (2010), SLT (2010), Sigdial (2013), Sigdial (2015), EMNLP (2015) and ACL (2016).