Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #2
Session:ASR II
Session Time:Thursday, December 20, 13:30 - 15:30
Presentation Time:Thursday, December 20, 13:30 - 15:30
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: SPEECH CHAIN FOR SEMI-SUPERVISED LEARNING OF JAPANESE-ENGLISH CODE-SWITCHING ASR AND TTS
Authors: Sahoko Nakayama; Nara Institute of Science and Technology 
 Andros Tjandra; Nara Institute of Science and Technology 
 Sakriani Sakti; Nara Institute of Science and Technology 
 Satoshi Nakamura; Nara Institute of Science and Technology 
Abstract: Code-switching speech, in which speakers alternate between two or more languages in the same utterance often occur in bilingual and multilingual communities. Such phenomenon poses challenges for spoken language technologies, i.e., automatic speech recognition (ASR) and text-to-speech synthesis (TTS), as the systems need to be able to handle the input in a multilingual setting. We may find code-switching text or code-switching speech in social media, but parallel speech and transcription of code-switching data suitable for training ASR and TTS are mostly unavailable. In this paper, we utilize a speech chain framework based on deep learning to enable ASR and TTS to learn code-switching in semi-supervised fashion. In particular, we construct our system on Japanese-English conversational speech. We first train ASR and TTS systems separately with parallel speech-text of monolingual data (supervised learning) and perform speech chain with only code-switching text or code-switching speech (unsupervised learning). Experimental results reveal that such closed-loop architecture allows ASR and TTS to teach each other without any parallel of code-switching data, and successfully improve the performance closed to the system that trained with sizeable parallel text-speech of code-switching data.