SS-10.4
Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?
Huy Phan, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland; Thi Ngoc Tho Nguyen, Nanyang Technological University, Singapore; Philipp Koch, Alfred Mertins, University of Lübeck, Germany
Session:
Signal Processing and Neural Approaches for Soundscapes (SiNApS)
Track:
Special Sessions
Location:
Gather Area A
Presentation Time:
Wed, 11 May, 20:00 - 20:45 China Time (UTC +8)
Wed, 11 May, 12:00 - 12:45 UTC
Wed, 11 May, 12:00 - 12:45 UTC
Session Co-Chairs:
Woon-Seng Gan, Nanyang Technological University and Bhan Lam, Nanyang Technological University and Wenwu Wang, University of Surrey and Yuki Mitsufuji, Sony Group Corporation
Session SS-10
SS-10.1: CONFORMER-BASED SELF-SUPERVISED LEARNING FOR NON-SPEECH AUDIO TASKS
Sangeeta Srivastava, The Ohio State University, United States of America; Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf, Meta, United States of America
SS-10.2: UNSUPERVISED AUDIO-CAPTION ALIGNING LEARNS CORRESPONDENCES BETWEEN INDIVIDUAL SOUND EVENTS AND TEXTUAL PHRASES
Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen, Tampere University, Finland
SS-10.3: SPATIAL DATA AUGMENTATION WITH SIMULATED ROOM IMPULSE RESPONSES FOR SOUND EVENT LOCALIZATION AND DETECTION
Yuichiro Koyama, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji, Sony Group Corporation, Japan; Kazuhide Shigemi, The University of Tokyo, Japan
SS-10.4: Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?
Huy Phan, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland; Thi Ngoc Tho Nguyen, Nanyang Technological University, Singapore; Philipp Koch, Alfred Mertins, University of Lübeck, Germany
SS-10.5: DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING
Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark Plumbley, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
SS-10.6: PROBABLY PLEASANT? A NEURAL-PROBABILISTIC APPROACH TO AUTOMATIC MASKER SELECTION FOR URBAN SOUNDSCAPE AUGMENTATION
Kenneth Ooi, Karn N. Watcharasupat, Bhan Lam, Zhen-Ting Ong, Woon-Seng Gan, Nanyang Technological University, Singapore