Paper ID | AUD-32.5 | ||
Paper Title | Audio Replay Spoof Attack Detection by Joint Segment-Based Linear Filter Bank Feature Extraction and Attention-Enhanced DenseNet-BiLSTM Network | ||
Authors | Lian Huang, Chi-Man Pun, University of Macau, Macau SAR China | ||
Session | AUD-32: Audio for Multimedia and Audio Processing Systems | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 14:00 - 14:45 | ||
Presentation Time: | Friday, 11 June, 14:00 - 14:45 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SEC] Audio Security | ||
Abstract | Most automatic speaker verification (ASV) systems are vulnerable to various spoofing attacks. To address this issue, in this article, we propose a novel model based on attention-enhanced DenseNet-BiLSTM network and segment-based linear filter bank features. First, silent segments are selected from each speech signal by using a short-term zero-crossing rate and energy. If the total duration of silent segments only contains a very limited amount of data, the decaying tails will be selected instead. Second, the linear filter bank features are extracted from the selected segments in the relatively high-frequency domain. Finally, an attention-enhanced DenseNet-BiLSTM architecture which can avoid the problems of overfitting is built. To validate this model, we used two datasets, including BTAS2016 and ASVspoof2017. Experiments show that using the attention-enhanced DenseNet-BiLSTM model with the segment-based linear filter bank feature achieves the best performance. Compared with the baseline system based on constant Q cepstral coefficient and Gaussian mixture model (GMM), the proposed model can produce a relative improvement of 91.68% and 74.04% on the two data sets respectively. |