Presentation # | 7 |
Session: | Speaker Recognition/Verification |
Session Time: | Thursday, December 20, 10:00 - 12:00 |
Presentation Time: | Thursday, December 20, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speaker/language recognition: |
Paper Title: |
TEACHER-STUDENT TRAINING FOR TEXT-INDEPENDENT SPEAKER RECOGNITION |
Authors: |
Raymond W. M. Ng; Emotech Labs | | |
| Xuechen Liu; Emotech Labs | | |
| Pawel Swietojanski; The University of New South Wales | | |
Abstract: |
This paper investigates text-independent speaker recognitionusing neural embedding extractors based on the time-delayneural network. Our primary focus is to explore the teacher-student (TS) training framework for knowledge distillation ina text-independent (TI) speaker recognition task. We reportthe results on both proprietary and public benchmarks, ob-taining competitive results with 88-93% smaller models. Par-ticularly, in the clean testing conditions, we find TS trainingon neural-based TI systems achieved the same or better per-formance than the i-vector based counterparts. Neural embed-dings are less prone to short segment issues, and offer betterperformance particularly in the high recall setting. They canalso provide some additional insights about speakers, such asgender or how difficult a given speaker can be for recognition. |