SLT 2018 • Technical Program • 2018 IEEE Workshop on Spoken Language Technology (SLT) | 18-21 December 2018

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #

Session:

Detection, Paralinguistics and Coding

Session Time:

Wednesday, December 19, 13:30 - 15:30

Presentation Time:

Wednesday, December 19, 13:30 - 15:30

Presentation:

Poster

Topic:

Speaker/language recognition:

Paper Title:

IMPROVED CONDITIONAL GENERATIVE ADVERSARIAL NET CLASSIFICATION FOR SPOKEN LANGUAGE RECOGNITION

Authors:

Xiaoxiao Miao; The University of Kent / Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics / University of Chinese Academy of Sciences

Ian McLoughlin; The University of Kent

Shengyu Yao; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics / University of Chinese Academy of Sciences

Yonghong Yan; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics / University of Chinese Academy of Sciences / Xinjiang Key Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences

Abstract:

Recent research on generative adversarial nets (GAN) for language identification (LID) has shown promising results. In this paper, we further exploit the latent abilities of GAN networks to firstly combine them with deep neural network (DNN)-based i-vector approaches and then to improve the LID model using conditional generative adversarial net (cGAN) classification. First, phoneme dependent deep bottleneck features (DBF) combined with output posteriors of a pre-trained DNN for automatic speech recognition (ASR) are used to extract i-vectors in the normal way. These i-vectors are then classified using cGAN, and we show an effective method within the cGAN to optimize parameters by combining both language identification and verification signals as supervision.Results show firstly that cGAN methods can significantly outperform DBF DNN i-vector methods where 49-dimensional i-vectors are used, but not where 600-dimensional vectors are used.Secondly, training a cGAN discriminator network for direct classification has further benefit for low dimensional i-vectors as well as short utterances with high dimensional i-vectors.However, incorporating a dedicated discriminator network output layer for classification and optimizing both classification and verification loss brings benefits in all test cases.