MLSP-53.1
WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
Ho-Hsiang Wu, Juan Bello, New York University, United States of America; Prem Seetharaman, Kundan Kumar, Descript, United States of America
Session:
Multimodal Analysis in Audio Applications
Track:
Machine Learning for Signal Processing
Location:
Gather Area H
Presentation Time:
Fri, 13 May, 22:00 - 22:45 China Time (UTC +8)
Fri, 13 May, 14:00 - 14:45 UTC
Fri, 13 May, 14:00 - 14:45 UTC
Session Chair:
Sharon Gannot, Bar-Ilan University
Session MLSP-53
MLSP-53.1: WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
Ho-Hsiang Wu, Juan Bello, New York University, United States of America; Prem Seetharaman, Kundan Kumar, Descript, United States of America
MLSP-53.2: ASD-TRANSFORMER: EFFICIENT ACTIVE SPEAKER DETECTION USING SELF AND MULTIMODAL TRANSFORMERS
Gourav Datta, University of Southern California, United States of America; Tyler Etchart, Vivek Yadav, Varsha Hedau, Pradeep Natarajan, Shih-Fu Chang, Amazon, United States of America
MLSP-53.3: MMLATCH: BOTTOM-UP TOP-DOWN FUSION FOR MULTIMODAL SENTIMENT ANALYSIS
Georgios Paraskevopoulos, Efthymios Georgiou, Alexandros Potamianos, National Technical University of Athens, Greece
MLSP-53.4: MULTI-CHANNEL ATTENTIVE GRAPH CONVOLUTIONAL NETWORK WITH SENTIMENT FUSION FOR MULTIMODAL SENTIMENT ANALYSIS
Luwei Xiao, Xingjiao Wu, Wen Wu, Jing Yang, Liang He, East China Normal University, China
MLSP-53.5: Learning Music Sequence Representation from Text Supervision
Tianyu Chen, Shuai Zhang, Haoyi Zhou, Jianxin Li, Beihang University, China; Yuan Xie, The Institute of Acoustics of the Chinese Academy of Sciences, China; Shaohan Huang, Microsoft, China
MLSP-53.6: ENHANCING AFFECTIVE REPRESENTATIONS OF MUSIC-INDUCED EEG THROUGH MULTIMODAL SUPERVISION AND LATENT DOMAIN ADAPTATION
Kleanthis Avramidis, University of Southern California, United States of America; Christos Garoufis, Athanasia Zlatintsi, Petros Maragos, National Technical University of Athens, Greece