Paper ID | MLR-APPL-IVSMR-3.10 | ||
Paper Title | Weakly-supervised Moment Retrieval Network for Video Corpus Moment Retrieval | ||
Authors | Sunjae Yoon, Dahyun Kim, Ji Woo Hong, Junyeong Kim, Kookhoi Kim, Chang D. Yoo, Korea Advanced Institute of Science and Technology, Republic of Korea | ||
Session | MLR-APPL-IVSMR-3: Machine learning for image and video sensing, modeling and representation 3 | ||
Location | Area D | ||
Session Time: | Wednesday, 22 September, 14:30 - 16:00 | ||
Presentation Time: | Wednesday, 22 September, 14:30 - 16:00 | ||
Presentation | Poster | ||
Topic | Applications of Machine Learning: Machine learning for image & video sensing, modeling, and representation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | This paper proposes Weakly-supervised Moment Retrieval Network (WMRN) for Video Corpus Moment Retrieval (VCMR), which retrieves pertinent temporal moments related to natural language query in a large video corpus. Previous methods for VCMR require full supervision of temporal boundary information for training, which involves a labor-intensive process of annotating the boundaries in a large number of videos. To leverage this, the proposed WMRN performs VCMR in a weakly-supervised manner, where WMRN is learned without ground-truth labels but only with video and text queries. For weakly-supervised VCMR, WMRN addresses the following two limitations of prior methods: (1) Blurry attention over video features due to redundant video candidate proposals generation, (2) Insufficient learning due to weak supervision only with video-query pairs. To this end, WMRN is based on (1) Text Guided Proposal Generation (TGPG) that effectively generates text guided multi-scale video proposals in the prospective region related to query, and (2) Hard Negative Proposal Sampling (HNPS) that enhances video-language alignment via extracting negative video proposals in positive video sample for contrastive learning. Experimental results show that WMRN achieves state-of-the-art performance on TVR and DiDeMo benchmarks in the weakly-supervised setting. To validate the attainments of proposed components of WMRN, comprehensive ablation studies and qualitative analysis are conducted. |