Presentation # | 10 |
Session: | Speaker Recognition/Verification |
Session Time: | Thursday, December 20, 10:00 - 12:00 |
Presentation Time: | Thursday, December 20, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speaker/language recognition: |
Paper Title: |
TRAINING SPEAKER RECOGNITION MODELS WITH RECORDING-LEVEL LABELS |
Authors: |
Tanel Alumäe; Tallinn University of Technology | | |
Abstract: |
In this paper, we investigate training speaker recognition models using coarse-grained speaker labels provided only at the recording level. The approach is based on the recently proposed weakly supervised training method that allows to train a speaker recognition deep neural network using a special cost function that doesn't need segment-level annotations. Experiments are conducted on the VoxCeleb corpus. We show that without using any reference segment-level labeling, the method can achieve 1% speaker recognition error rate on the official VoxCeleb closed set speaker recognition test set, as opposed to 5.4% that was previously reported. By training a x-vector based speaker verification system on the resegmented and relabeled VoxCeleb corpus, we can achieve 4.57% EER on the VoxCeleb speaker verification test set which is a 17% relative improvement over the best system that uses the official VoxCeleb speaker annotations. |