Paper ID | AUD-30.6 | ||
Paper Title | SLOW-FAST AUDITORY STREAMS FOR AUDIO RECOGNITION | ||
Authors | Evangelos Kazakos, University of Bristol, United Kingdom; Arsha Nagrani, Andrew Zisserman, University of Oxford, United Kingdom; Dima Damen, University of Bristol, United Kingdom | ||
Session | AUD-30: Detection and Classification of Acoustic Scenes and Events 5: Scenes | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state-of-the-art results on both. |