Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #10
Session:Spoken Language Understanding
Session Time:Wednesday, December 19, 10:00 - 12:00
Presentation Time:Wednesday, December 19, 10:00 - 12:00
Presentation: Poster
Topic: Spoken language understanding:
Paper Title: FROM AUDIO TO SEMANTICS: APPROACHES TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Authors: Parisa Haghani; Google 
 Arun Narayanan; Google 
 Michiel Bacchiani; Google 
 Galen Chuang; Google 
 Neeraj Gaur; Google 
 Pedro Moreno; Google 
 Rohit Prabhavalkar; Google 
 Zhongdi Qu; Google 
 Austin Waters; Google 
Abstract: Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently. In this paper, we formulate audio to semantic understanding as a sequence-to-sequence problem. We propose and compare various encoder-decoder based approaches that optimize both modules jointly, in an end-to-end manner. Evaluations on a real-world task show that 1) having an intermediate text representation is crucial for the quality of the predicted semantics, especially the intent arguments and 2) jointly optimizing the full system improves overall accuracy of prediction. Compared to independently trained models, our best jointly trained model achieves similar intent prediction F1 scores, but improves argument word error rate by 18% relative.