Paper ID | SPE-8.1 |
Paper Title |
SQUEEZING VALUE OF CROSS-DOMAIN LABELS: A DECOUPLED SCORING APPROACH FOR SPEAKER VERIFICATION |
Authors |
Lantian Li, Yang Zhang, Jiawen Kang, Thomas Fang Zheng, Dong Wang, Tsinghua University, China |
Session | SPE-8: Speaker Recognition 2: Channel and Domain Robustness |
Location | Gather.Town |
Session Time: | Tuesday, 08 June, 14:00 - 14:45 |
Presentation Time: | Tuesday, 08 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Domain mismatch is often occurred in real applications and causes serious performance reduction on speaker recognition systems. The common wisdom is to collect cross-domain data and train a multi-domain PLDA model, with the hope to learn a domain-independent speaker subspace. In this paper, we firstly present an empirical study to show that simply adding cross-domain data does not help performance in conditions with enroll-test mismatch. Careful analysis shows that this striking result is caused by the incoherent statistics between enroll and test conditions. Based on this analysis, we present a decoupled scoring approach that can maximally squeeze the value of cross-domain labels and obtain optimal verification scores in the enrollment-test mismatch condition. When the statistics are coherent, the new formulation falls back to the conventional PLDA. Experimental results on cross-channel test show that the proposed approach is highly effective and is a principal solution to domain mismatch. |