MMSP-1.1
DISTRIBUTED AUDIO-VISUAL PARSING BASED ON MULTIMODAL TRANSFORMER AND DEEP JOINT SOURCE CHANNEL CODING
Penghong Wang, Xiaopeng Fan, Harbin Institute of Technology, China; Jiahui Li, Mengyao Ma, Huawei, China
Session:
Multimodal Signal Processing, Analysis, and Synthesis I
Track:
Multimedia Signal Processing
Location:
Gather Area O
Presentation Time:
Sun, 8 May, 22:00 - 22:45 China Time (UTC +8)
Sun, 8 May, 14:00 - 14:45 UTC
Sun, 8 May, 14:00 - 14:45 UTC
Session Chair:
Wei Hu, Peking University
Session MMSP-1
MMSP-1.1: DISTRIBUTED AUDIO-VISUAL PARSING BASED ON MULTIMODAL TRANSFORMER AND DEEP JOINT SOURCE CHANNEL CODING
Penghong Wang, Xiaopeng Fan, Harbin Institute of Technology, China; Jiahui Li, Mengyao Ma, Huawei, China
MMSP-1.2: TALKINGFLOW: TALKING FACIAL LANDMARK GENERATION WITH MULTI-SCALE NORMALIZING FLOW NETWORK
Sen Liang, Zhize Zhou, Hujun Bao, Zhejiang University, China; Rong Li, Zhejiang Lab, China; Juyong Zhang, University of Science and Technology of China, China
MMSP-1.3: INCORPORATING GAZE BEHAVIOR USING JOINT EMBEDDING WITH SCENE CONTEXT FOR DRIVER TAKEOVER DETECTION
Yuning Qiu, Carlos Busso, University of Texas at Dallas, United States of America; Teruhisa Misu, Kumar Akash, Honda Research Institute USA, Inc., United States of America
MMSP-1.4: MULTI-VIEW AND MULTI-MODAL EVENT DETECTION UTILIZING TRANSFORMER-BASED MULTI-SENSOR FUSION
Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, Noboru Harada, NTT Corporation, Japan
MMSP-1.5: DISTRIBUTED LABEL DEQUANTIZED GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR MULTI-VIEW DATA INTEGRATION
Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
MMSP-1.6: CO-ATTENTION-GUIDED BILINEAR MODEL FOR ECHO-BASED DEPTH ESTIMATION
Go Irie, Takashi Shibata, Akisato Kimura, NTT Corporation, Japan