Presentation # | 8 |
Session: | Deep Learning for Speech Synthesis |
Session Time: | Tuesday, December 18, 14:00 - 17:00 |
Presentation Time: | Tuesday, December 18, 14:00 - 17:00 |
Presentation: |
Invited talk, Discussion, Oral presentation, Poster session
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
MULTI-SCALE ALIGNMENT AND CONTEXTUAL HISTORY FOR ATTENTION MECHANISM IN SEQUENCE-TO-SEQUENCE MODEL |
Authors: |
Andros Tjandra; Nara Institute of Science and Technology | | |
| Sakriani Sakti; Nara Institute of Science and Technology | | |
| Satoshi Nakamura; Nara Institute of Science and Technology | | |
Abstract: |
A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the encoder and decoder modules and improves model performance in many tasks. In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module. First, we maintain the history of the location and the expected context from several previous time-steps. Second, we apply multiscale convolution from several previous attention vectors to the current decoder state. We utilized our proposed framework for sequence-to-sequence speech recognition and text-to-speech systems. The results reveal that our proposed extension can improve performance significantly compared to a standard attention baseline. |