Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #2
Session:ASR III (End-to-End)
Session Time:Friday, December 21, 10:00 - 12:00
Presentation Time:Friday, December 21, 10:00 - 12:00
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
Authors: Jennifer Drexler; Massachusetts Institute of Technology 
 James Glass; Massachusetts Institute of Technology 
Abstract: In this paper, we develop an end-to-end automatic speech recognition (ASR) model designed for a common low-resource scenario: no pronunciation dictionary or phonemic transcripts, very limited transcribed speech, and much larger non-parallel text and speech corpora. Our semi-supervised model is built on top of an encoder decoder model with attention and takes advantage of non-parallel speech and text corpora in several ways: a denoising text autoencoder that shares parameters with the ASR decoder, a speech autoencoder that shares parameters with the ASR encoder, and adversarial training that encourages the speech and text encoders to use the same embedding space. We show that a model with this architecture significantly outperforms the baseline in this low-resource condition. We additionally perform an ablation evaluation, demonstrating that all of our added components contribute substantially to the overall performance of our model. We propose several avenues for further work, noting in particular that a model with this architecture could potentially enable fully unsupervised speech recognition.