2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDHLT-10.1
Paper Title INCORPORATING SYNTACTIC AND PHONETIC INFORMATION INTO MULTIMODAL WORD EMBEDDINGS USING GRAPH CONVOLUTIONAL NETWORKS
Authors Wenhao Zhu, Shuang Liu, Chaoming Liu, Shanghai University, China
SessionHLT-10: Multi-modality in Language
LocationGather.Town
Session Time:Wednesday, 09 June, 16:30 - 17:15
Presentation Time:Wednesday, 09 June, 16:30 - 17:15
Presentation Poster
Topic Human Language Technology: [HLT-MMPL] Multimodal Processing of Language
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Multimodal models have been proven to outperform text-based models on learning semantic word representations. According to psycholinguistic theory, there is a graphical relationship among the modalities of language, and in recent years, the graph convolution network (GCN) has been proven to have substantial advantages in the extraction of non-European spatial features. This inspires us to propose a new multimodal word representation model, namely, GCNW, which uses the graph convolutional network to incorporate the phonetic and syntactic information into the word representation. We use a greedy strategy to update the modality-relation matrix in the GCN, and we train the model through unsupervised learning. We evaluated the proposed model on multiple downstream NLP tasks, and various experimental results demonstrate that the GCNW outperforms strong unimodal baselines and state-of-the-art multimodal models. We make the source code of both models available to encourage reproducible research.