| Paper ID | MLR-APPL-IVSMR-2.10 | ||
| Paper Title | HIERARCHICAL DOMAIN-CONSISTENT NETWORK FOR CROSS-DOMAIN OBJECT DETECTION | ||
| Authors | Yuanyuan Liu, Ziyang Liu, Fang Fang, China University of Geosciences, Wuhan, China; Zhanghua Fu, The Chinese University of Hong Kong, China; Zhanlong Chen, China University of Geosciences, Wuhan, China | ||
| Session | MLR-APPL-IVSMR-2: Machine learning for image and video sensing, modeling and representation 2 | ||
| Location | Area D | ||
| Session Time: | Tuesday, 21 September, 15:30 - 17:00 | ||
| Presentation Time: | Tuesday, 21 September, 15:30 - 17:00 | ||
| Presentation | Poster | ||
| Topic | Applications of Machine Learning: Machine learning for image & video sensing, modeling, and representation | ||
| IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
| Abstract | Cross-domain object detection is a very challenging task due to multi-level domain shift in an unseen domain. To address the problem, this paper proposes a hierarchical domain-consistent network (HDCN) for cross-domain object detection, which effectively suppresses pixel-level, image-level, as well as instance-level domain shift via jointly aligning three-level features. Firstly, at the pixel-level feature alignment stage, a pixel-level subnet with foreground-aware attention learning and pixel-level adversarial learning is proposed to focus on local foreground transferable information. Then, at the image-level feature alignment stage, global domain-invariant features are learned from the whole image through image-level adversarial learning. Finally, at the instance-level alignment stage, a prototype graph convolution network is conducted to guarantee distribution alignment of instances by minimizing the distance of prototypes with the same category but from different domains. Moreover, to avoid the non-convergence problem during multi-level feature alignment, a domain-consistent loss is proposed to harmonize the adaptation training process. Comprehensive results on various cross-domain detection tasks demonstrate the broad applicability and effectiveness of the proposed approach. | ||