Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #4
Session:ASR I
Session Time:Wednesday, December 19, 10:00 - 12:00
Presentation Time:Wednesday, December 19, 10:00 - 12:00
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: EFFICIENT BUILDING STRATEGY WITH KNOWLEDGE DISTILLATION FOR SMALL-FOOTPRINT ACOUSTIC MODELS
Authors: Takafumi Moriya; NTT Corporation 
 Hiroki Kanagawa; NTT Corporation 
 Kiyoaki Matsui; NTT Corporation 
 Takaaki Fukutomi; NTT Corporation 
 Yusuke Shinohara; NTT Corporation 
 Yoshikazu Yamaguchi; NTT Corporation 
 Manabu Okamoto; NTT Corporation 
 Yushi Aono; NTT Corporation 
Abstract: In this paper, we propose a novel training strategy for deep neural network (DNN) based small-footprint acoustic models. The accuracy of DNN-based automatic speech recognition (ASR) systems can be greatly improved by leveraging large amounts of data to improve the level of expression. DNNs use many parameters to enhance recognition performance. Unfortunately, resource-constrained local devices are unable to run complex DNN-based ASR systems. For building compact acoustic models, the knowledge distillation (KD) approach is often used. KD uses a large, well-trained model that outputs target labels to train a compact model. However, the standard KD cannot fully utilize the large model outputs to train compact models because the soft logits provide only rough information. We assume that the large model must give more useful hints to the compact model. We propose an advanced KD that uses mean squared error to minimize the discrepancies between the final hidden layer outputs. We evaluate our proposal on recorded speech data sets assuming car- and home-use scenarios, and show that our models achieve lower character error rates than the conventional KD approach or from-scratch training on computation resource-constrained devices.