2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	MLSP-48.4
Paper Title	A COMPACT JOINT DISTILLATION NETWORK FOR VISUAL FOOD RECOGNITION
Authors	Heng Zhao, Kim-Hui Yap, Alex Chichung Kot, Nanyang Technological University, Singapore
Session	MLSP-48: Neural Network Applications
Location	Gather.Town
Session Time:	Friday, 11 June, 14:00 - 14:45
Presentation Time:	Friday, 11 June, 14:00 - 14:45
Presentation	Poster
Topic	Machine Learning for Signal Processing: [MLR-APPL] Applications of machine learning
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Visual food recognition is emerging as an important application in dietary monitoring and management in recent years. Existing works use large backbone networks to achieve good performance. However, these networks are not able to be deployed on personal portable devices due to large size and computation cost. Some compact networks have been developed, however, their performance are usually lower than the large backbone networks. In view of this, this paper proposes a joint distillation framework that targets to achieve a high visual food recognition accuracy using a compact network. As opposed to the more traditional one-directional knowledge distillation methods, the proposed knowledge distillation framework trains both the large teacher network and the compact student network simultaneously. The framework introduces a new Multi-Layer Distillation (MLD) for simultaneous teacher-student learning at multiple layers of different abstraction. A novel Instance Activation Mapping (IAM) is proposed to jointly train the teacher and student networks using generated instance-level activation map that incorporates label information for each training image. Experimental results on the two benchmark datasets UECFood-256 and Food-101 show that the trained compact student network achieves state-of-the-art performance at 83.5% and 90.4%, respectively, while achieving more than 4 times deduction regarding network model size.