Paper ID | AUD-31.2 |
Paper Title |
A NEW DCASE 2017 RARE SOUND EVENT DETECTION BENCHMARK UNDER EQUAL TRAINING DATA: CRNN WITH MULTI-WIDTH KERNELS |
Authors |
Jan Baumann, Patrick Meyer, Timo Lohrenz, Technische Universität Braunschweig, Germany; Alexander Roy, Michael Papendieck, IAV GmbH, Germany; Tim Fingscheidt, Technische Universität Braunschweig, Germany |
Session | AUD-31: Detection and Classification of Acoustic Scenes and Events 6: Events |
Location | Gather.Town |
Session Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Rare sound event detection (rare SED) deals with obtaining valuable information from data consisting mostly of acoustic background noises. It has meanwhile a long research history and was part of the DCASE 2017 Challenge. State-of-the-art performance is currently reached using a stacked combination of a CNN and an RNN, dubbed CRNN, which was also successfully applied in other domains such as in hybrid automatic speech recognition. In this work, we propose a new CRNN model for rare SED. This new model uses a set of parallel convolutions with multiple kernel widths in the CRNN and is based on an extended feature representation of the log-mel spectrogram. Furthermore, we apply and optimize different evaluation postprocessing methods and analyze the modifications in an ablation study. The proposed model outperforms the so-far top-scoring networks of the DCASE Challenge - using the same training material for all methods - by an error rate of 6.13% absolute and by 4.39% absolute in the F1 score on the test set and under these conditions achieves a new benchmark result on the DCASE 2017 Rare SED data set. |