Technical Program

Paper Detail

Presentation #10
Session:Speaker Recognition/Verification
Location:Kallirhoe Hall
Session Time:Thursday, December 20, 10:00 - 12:00
Presentation Time:Thursday, December 20, 10:00 - 12:00
Presentation: Poster
Topic: Speaker/language recognition:
Authors: Tanel Alumäe, Tallinn University of Technology, Estonia
Abstract: In this paper, we investigate training speaker recognition models using coarse-grained speaker labels provided only at the recording level. The approach is based on the recently proposed weakly supervised training method that allows to train a speaker recognition deep neural network using a special cost function that doesn't need segment-level annotations. Experiments are conducted on the VoxCeleb corpus. We show that without using any reference segment-level labeling, the method can achieve 1% speaker recognition error rate on the official VoxCeleb closed set speaker recognition test set, as opposed to 5.4% that was previously reported. By training a x-vector based speaker verification system on the resegmented and relabeled VoxCeleb corpus, we can achieve 4.57% EER on the VoxCeleb speaker verification test set which is a 17% relative improvement over the best system that uses the official VoxCeleb speaker annotations.