AUD-33.2
AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO
Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Germany
Session:
Extended Evaluation and Captioning
Track:
Audio and Acoustic Signal Processing
Location:
Gather Area K
Presentation Time:
Fri, 13 May, 21:00 - 21:45 China Time (UTC +8)
Fri, 13 May, 13:00 - 13:45 UTC
Fri, 13 May, 13:00 - 13:45 UTC
Session Chair:
Emanuël Habets, University of Erlangen-Nuremberg
Session AUD-33
AUD-33.1: DIVERSITY-CONTROLLABLE AND ACCURATE AUDIO CAPTIONING BASED ON NEURAL CONDITION
Xuenan Xu, Mengyue Wu, Kai Yu, Shanghai Jiao Tong University, China
AUD-33.2: AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO
Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Germany
AUD-33.3: CAN AUDIO CAPTIONS BE EVALUATED WITH IMAGE CAPTION METRICS?
Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kenny Zhu, Shanghai Jiao Tong University, China
AUD-33.4: A DATA-DRIVEN COGNITIVE SALIENCE MODEL FOR OBJECTIVE PERCEPTUAL AUDIO QUALITY ASSESSMENT
Pablo M. Delgado, Jürgen Herre, International Audio Laboratories Erlangen, Germany
AUD-33.5: Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models
Ryosuke Sawata, Yosuke Kashiwagi, Shusuke Takahashi, Sony Group Corporation, Japan
AUD-33.6: EFFECT OF NOISE SUPPRESSION LOSSES ON SPEECH DISTORTION AND ASR PERFORMANCE
Sebastian Braun, Hannes Gamper, Microsoft, United States of America