Invited Talks

Invited talk #1, Fei Sha, "Large-scale Kernel Methods for Acoustic Modeling"

Invited talk #2, Lukas Burget, "Subspace Modeling Techniques in Speech and Language Processing."

Invited talk #1, "Large-scale Kernel Methods for Acoustic Modeling"

Monday, December 8, 13:00-14:00

Room: Emerald A-B

Fei Sha (University of Southern California)


Kernel methods, such as (nonlinear) support vector machines, are powerful modeling tools. However, their computational complexity often prevents their capabilities from being fully exploited for problems at the scale that is typical in speech processing and recognition.

In this talk, I will describe how to muster skillfully a few existing and new techniques to successfully overcome the challenge. I will show that for the task of acoustic modeling, kernel methods, when scaled up, perform as competitively as alternatives such as deep neural networks. This is an exciting development as it opens up new routes and tools for addressing difficult problems in speech and language processing.

This talk is based on a joint effort across several groups at USC (myself and my students), Columbia (Michael Collins and his students), and Brian Kingsbury and Michael Picheny (IBM T. J. Watson Research).

About the speaker

Dr. Fei Sha is the Jack Munushian Early Career Chair and an associate professor at the University of Southern California, Dept. of Computer Science. His primary research interests are machine learning and its application to speech and language processing, computer vision, and robotics. He had won outstanding student paper awards at NIPS 2006 and ICML 2004. He was selected as a Sloan Research Fellow in 2013, won an Army Research Office Young Investigator Award in 2012, and was a member of DARPA 2010 Computer Science Study Panel. He has a Ph.D (2007) from Computer and Information Science from U. of Pennsylvania and B.Sc and M.Sc from Southeast University (Nanjing, China).

Invited talk #2, "Subspace Modeling Techniques in Speech and Language Processing"

Tuesday, December 9, 13:00-14:00

Room: Emerald A-B

Lukas Burget (Brno University of Technology)


The recently introduced subspace modeling techniques revolutionized the field of Speaker recognition. Dramatic performance improvements were observed in both speed and accuracy, which have increased the scale of viable speaker-id systems by several orders of magnitude. In this talk, we review the concept of so-called "i-vectors", which represent sequences of continuous speech features by a low-dimensional fixed length vectors. We review the recent applications of i-vectors to other speech processing areas and problems. Recent extensions and variations of the i-vector concept will be discussed, which might be also of interest to researches from related communities such as Natural Language Processing (NLP) or Spoken Language Understanding. For example, a “subspace n-gram model” was recently introduced to model sequences of discrete features in prosodic speaker recognition or to represent phonotactics in language recognition. This model allows us to represent high-dimensional n-gram statistics by a low-dimensional continuous-valued vectors, which can be then further modeled by very simple statistical models. We will point out some advantages of these representations compared to similar techniques (e.g. Probabilistic Latent Semantic Analysis), which are used in speech and language community.

About the speaker

Lukas Burget (Ing. [MS]. Brno University of Technology, 1999, Ph.D. Brno University of Technology, 2004) is assistant professor at Faculty of Information Technology, University of Technology, Brno, Czech Republic. He serves as scientific director of the Speech@FIT research group. From 2000 to 2002, he was a visiting researcher at OGI Portland, USA and from 2011 to 2012 he spent his sabbatical leave at SRI International, Menlo Park, USA. Lukas was invited to lead the “Robust Speaker Recognition over Varying Channels” team at the JHU CLSP summer workshop in 2008, and the team of BOSARIS workshop in 2010. His scientific interests are in the field of speech processing, namely acoustic modeling for speech, speaker and language recognition. He has authored or co-authored more than 110 papers in journals and conferences. Lukas was the leader of teams successful in NIST LRE 2005, 2007 and NIST SRE 2006 and 2008 evaluations.