IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
SS-10.5

DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING

Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark Plumbley, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland

Session:
Signal Processing and Neural Approaches for Soundscapes (SiNApS)

Track:
Special Sessions

Location:
Gather Area A

Presentation Time:
Wed, 11 May, 20:00 - 20:45 China Time (UTC +8)
Wed, 11 May, 12:00 - 12:45 UTC

Session Co-Chairs:
Woon-Seng Gan, Nanyang Technological University and Bhan Lam, Nanyang Technological University and Wenwu Wang, University of Surrey and Yuki Mitsufuji, Sony Group Corporation
Presentation
Discussion
Resources
Session SS-10
SS-10.1: CONFORMER-BASED SELF-SUPERVISED LEARNING FOR NON-SPEECH AUDIO TASKS
Sangeeta Srivastava, The Ohio State University, United States of America; Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf, Meta, United States of America
SS-10.2: UNSUPERVISED AUDIO-CAPTION ALIGNING LEARNS CORRESPONDENCES BETWEEN INDIVIDUAL SOUND EVENTS AND TEXTUAL PHRASES
Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen, Tampere University, Finland
SS-10.3: SPATIAL DATA AUGMENTATION WITH SIMULATED ROOM IMPULSE RESPONSES FOR SOUND EVENT LOCALIZATION AND DETECTION
Yuichiro Koyama, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji, Sony Group Corporation, Japan; Kazuhide Shigemi, The University of Tokyo, Japan
SS-10.4: Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?
Huy Phan, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland; Thi Ngoc Tho Nguyen, Nanyang Technological University, Singapore; Philipp Koch, Alfred Mertins, University of Lübeck, Germany
SS-10.5: DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING
Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark Plumbley, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
SS-10.6: PROBABLY PLEASANT? A NEURAL-PROBABILISTIC APPROACH TO AUTOMATIC MASKER SELECTION FOR URBAN SOUNDSCAPE AUGMENTATION
Kenneth Ooi, Karn N. Watcharasupat, Bhan Lam, Zhen-Ting Ong, Woon-Seng Gan, Nanyang Technological University, Singapore