Presentation # | 2 |
Session: | ASR II |
Location: | Kallirhoe Hall |
Session Time: | Thursday, December 20, 13:30 - 15:30 |
Presentation Time: | Thursday, December 20, 13:30 - 15:30 |
Presentation: |
Poster
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
SPEECH CHAIN FOR SEMI-SUPERVISED LEARNING OF JAPANESE-ENGLISH CODE-SWITCHING ASR AND TTS |
Authors: |
Sahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, Nara Institute of Science and Technology, Japan |
Abstract: |
Code-switching speech, in which speakers alternate between two or more languages in the same utterance often occur in bilingual and multilingual communities. Such phenomenon poses challenges for spoken language technologies, i.e., automatic speech recognition (ASR) and text-to-speech synthesis (TTS), as the systems need to be able to handle the input in a multilingual setting. We may find code-switching text or code-switching speech in social media, but parallel speech and transcription of code-switching data suitable for training ASR and TTS are mostly unavailable. In this paper, we utilize a speech chain framework based on deep learning to enable ASR and TTS to learn code-switching in semi-supervised fashion. In particular, we construct our system on Japanese-English conversational speech. We first train ASR and TTS systems separately with parallel speech-text of monolingual data (supervised learning) and perform speech chain with only code-switching text or code-switching speech (unsupervised learning). Experimental results reveal that such closed-loop architecture allows ASR and TTS to teach each other without any parallel of code-switching data, and successfully improve the performance closed to the system that trained with sizeable parallel text-speech of code-switching data. |