Paper ID | MLR-APPL-MDSP.10 | ||
Paper Title | TWO-STREAM HYBRID ATTENTION NETWORK FOR MULTIMODAL CLASSIFICATION | ||
Authors | Qipin Chen, Pennsylvania State University, United States; Zhenyu Shi, Zhen Zuo, Jinmiao Fu, Yi Sun, Amazon LLC, United States | ||
Session | MLR-APPL-MDSP: Machine learning for multidimensional signal processing | ||
Location | Area F | ||
Session Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation | Poster | ||
Topic | Multidimensional Signal Processing: Signal and system modeling and identification | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | On modern e-commerce platforms like Amazon, the number of products is fast growing, precise and efficient product classification becomes a key lever to great customer shopping experience. To tackle the large-scale product classification problem, a major challenge is how to leverage multimodal product information (e.g., image, text). One of the most successful directions is the attention-based deep multimodal learning, where there are mainly two types of frameworks: 1) keyless attention, which learns the importance of features within each modal; and 2) key-based attention, which learns the importance of features using other modalities. In this paper, we propose a novel Two-stream Hybrid Attention Network (HANet), which leverages both key-based and keyless attention mechanisms to capture the key information across product image and title modalities. We experimentally show that our HANet achieves state-of-the-art performance on Amazon-scale product classification problem. |