Technical Program

Paper Detail

Presentation #11
Session:Speaker Recognition/Verification
Location:Kallirhoe Hall
Session Time:Thursday, December 20, 10:00 - 12:00
Presentation Time:Thursday, December 20, 10:00 - 12:00
Presentation: Poster
Topic: Speaker/language recognition:
Authors: Narumitsu Ikeda, The University of Tokyo, Japan; Yoshinao Sato, Fairy Devices Inc., Japan; Hirokazu Takahashi, The University of Tokyo, Japan
Abstract: Short utterances cause performance degradation in conventional speaker recognition systems based on i-vector, which relies on the statistics of spectral features. To overcome this difficulty, we propose a novel method that utilizes the dynamics of the spectral features as well as their distribution. Our model integrates echo state network (ESN), a type of reservoir computing architecture, and self-organizing map (SOM), a competitive learning network. The ESN consists of a single-hidden-layer recurrent neural network with randomly fixed weights, which extracts temporal patterns of the spectral features. The input weights of our model are trained using the unsupervised competitive learning algorithm of the SOM, before enrollment, to extract the intrinsic structure of the spectral features, whereas the input weights are fixed randomly in the original ESN. In enrollment, the output weights are trained in a supervised manner to recognize an individual in a group of speakers. Our experiment demonstrates that the proposed method outperforms or is comparable to a baseline i-vector system for text-independent speaker identification on short utterances.