SPE-36: Speech Enhancement 6: Multi-modal Processing |
Session Type: Poster |
Time: Thursday, 10 June, 14:00 - 14:45 |
Location: Gather.Town |
Virtual Session: View on Virtual Platform |
Session Chair: Chandan K A Reddy, Microsoft
|
|
SPE-36.1: AUDIO-VISUAL SPEECH INPAINTING WITH DEEP LEARNING |
Giovanni Morrone; University of Modena and Reggio Emilia |
Daniel Michelsanti; Aalborg University |
Zheng-Hua Tan; Aalborg University |
Jesper Jensen; Aalborg University |
|
SPE-36.2: VSET: A MULTIMODAL TRANSFORMER FOR VISUAL SPEECH ENHANCEMENT |
Karthik Ramesh; Huawei |
Chao Xing; Huawei |
Wupeng Wang; Huawei |
Dong Wang; Tsinghua University |
Xiao Chen; Huawei |
|
SPE-36.3: SWITCHING VARIATIONAL AUTO-ENCODERS FOR NOISE-AGNOSTIC AUDIO-VISUAL SPEECH ENHANCEMENT |
Mostafa Sadeghi; Inria, Grenoble Alpes |
Xavier Alameda-Pineda; Inria, Grenoble Alpes |
|
SPE-36.4: AUDIO-VISUAL SPEECH ENHANCEMENT METHOD CONDITIONED ON THE LIP MOTION AND SPEAKER-DISCRIMINATIVE EMBEDDINGS |
Koichiro Ito; Hitachi, Ltd. |
Masaaki Yamamoto; Hitachi, Ltd. |
Kenji Nagamatsu; Hitachi, Ltd. |
|
SPE-36.5: AUDIO-VISUAL SPEECH SEPARATION USING CROSS-MODAL CORRESPONDENCE LOSS |
Naoki Makishima; NTT Media Intelligence Laboratories, NTT Corporation |
Mana Ihori; NTT Media Intelligence Laboratories, NTT Corporation |
Akihiko Takashima; NTT Media Intelligence Laboratories, NTT Corporation |
Tomohiro Tanaka; NTT Media Intelligence Laboratories, NTT Corporation |
Shota Orihashi; NTT Media Intelligence Laboratories, NTT Corporation |
Ryo Masumura; NTT Media Intelligence Laboratories, NTT Corporation |
|
SPE-36.6: MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES |
Zexu Pan; National University of Singapore |
Ruijie Tao; National University of Singapore |
Chenglin Xu; National University of Singapore |
Haizhou Li; National University of Singapore |
|