Paper ID | MLR-APPL-IVSMR-1.10 | ||
Paper Title | DIFFERENTIABLE DYNAMIC CHANNEL ASSOCIATION FOR KNOWLEDGE DISTILLATION | ||
Authors | Qiankun Tang, Zhejiang Lab, China; Xiaogang Xu, Zhejiang Gongshang University, China; Jun Wang, Zhejiang Lab, China | ||
Session | MLR-APPL-IVSMR-1: Machine learning for image and video sensing, modeling and representation 1 | ||
Location | Area C | ||
Session Time: | Tuesday, 21 September, 13:30 - 15:00 | ||
Presentation Time: | Tuesday, 21 September, 13:30 - 15:00 | ||
Presentation | Poster | ||
Topic | Applications of Machine Learning: Machine learning for image & video sensing, modeling, and representation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Knowledge distillation is an effective model compression technology, which encourages a small student model to mimic the features or probabilistic outputs of a large teacher model. Existing feature-based distillation methods mainly focus on formulating enriched representations, while naively address the channel dimension gap and adopt the handcrafted channel association strategy between teacher and student for distillation. This not only introduces more parameters and computational cost, but may transfer irrelevant information to student. In this paper, we present a differentiable and efficient Dynamic Channel Association (DCA) mechanism, which automatically associates proper teacher channels for each student channel. DCA also enables each student channel to distill knowledge from multiple teacher channels in a weighted manner. Extensive experiments on classification task, with various combinations of network architectures for teacher and student models, well demonstrate the effectiveness of our proposed approach. |