2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDIVMSP-7.5
Paper Title MULTI-ORDER ADVERSARIAL REPRESENTATION LEARNING FOR COMPOSED QUERY IMAGE RETRIEVAL
Authors Zhixiao Fu, Zhejiang University, China; Xinyuan Chen, East China Normal University, China; Jianfeng Dong, Zhejiang Gongshang University, China; Shouling Ji, Zhejiang University, China
SessionIVMSP-7: Machine Learning for Image Processing I
LocationGather.Town
Session Time:Wednesday, 09 June, 13:00 - 13:45
Presentation Time:Wednesday, 09 June, 13:00 - 13:45
Presentation Poster
Topic Image, Video, and Multidimensional Signal Processing: [IVSMR] Image & Video Sensing, Modeling, and Representation
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract This paper targets at a task of composed query image retrieval. Given a composed query consists of a reference image and modification text, the task aims to retrieve images which are generally similar to the reference image but differ according to the given modification text. The task is challenging, due to the complexity of the composed query and cross-modality characteristics between the query and candidate images. The common paradigm for the task is to first obtain fused feature of the reference image and the text, and further project them into a common embedding space with candidate images.However, the majority of works usually only aim for the representation of high level, ignoring the low-level representation which may be complementary to the high-level representation. So this paper proposes a new Multi-order Adversarial Network (MAN) which uses multi-level representations and simultaneously explores their low-order and high-order interactions, obtaining low-order and high-order features. The low-order features reflect the pattern of itself and high-order features contains the interaction between features. Moreover, we further introduce an adversarial module to constrain the fusion of the reference image and the text. Extensive experiments on three datasets verify the effectiveness of our MAN and also demonstrate its state-of-the-art performance.