Technical Program

Paper Detail

Presentation #7
Session:Speaker Recognition/Verification
Location:Kallirhoe Hall
Session Time:Thursday, December 20, 10:00 - 12:00
Presentation Time:Thursday, December 20, 10:00 - 12:00
Presentation: Poster
Topic: Speaker/language recognition:
Authors: Raymond W. M. Ng, Xuechen Liu, Emotech Labs, United Kingdom; Pawel Swietojanski, The University of New South Wales, Australia
Abstract: This paper investigates text-independent speaker recognitionusing neural embedding extractors based on the time-delayneural network. Our primary focus is to explore the teacher-student (TS) training framework for knowledge distillation ina text-independent (TI) speaker recognition task. We reportthe results on both proprietary and public benchmarks, ob-taining competitive results with 88-93% smaller models. Par-ticularly, in the clean testing conditions, we find TS trainingon neural-based TI systems achieved the same or better per-formance than the i-vector based counterparts. Neural embed-dings are less prone to short segment issues, and offer betterperformance particularly in the high recall setting. They canalso provide some additional insights about speakers, such asgender or how difficult a given speaker can be for recognition.