Technical Program

Paper Detail

Presentation #	10
Session:	Speaker Recognition/Verification
Location:	Kallirhoe Hall
Session Time:	Thursday, December 20, 10:00 - 12:00
Presentation Time:	Thursday, December 20, 10:00 - 12:00
Presentation:	Poster
Topic:	Speaker/language recognition:
Paper Title:	TRAINING SPEAKER RECOGNITION MODELS WITH RECORDING-LEVEL LABELS
Authors:	Tanel Alumäe, Tallinn University of Technology, Estonia
Abstract:	In this paper, we investigate training speaker recognition models using coarse-grained speaker labels provided only at the recording level. The approach is based on the recently proposed weakly supervised training method that allows to train a speaker recognition deep neural network using a special cost function that doesn't need segment-level annotations. Experiments are conducted on the VoxCeleb corpus. We show that without using any reference segment-level labeling, the method can achieve 1% speaker recognition error rate on the official VoxCeleb closed set speaker recognition test set, as opposed to 5.4% that was previously reported. By training a x-vector based speaker verification system on the resegmented and relabeled VoxCeleb corpus, we can achieve 4.57% EER on the VoxCeleb speaker verification test set which is a 17% relative improvement over the best system that uses the official VoxCeleb speaker annotations.