Login Paper Search My Schedule Paper Index Help

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #6
Session:Voice Conversion and TTS
Session Time:Friday, December 21, 10:00 - 12:00
Presentation Time:Friday, December 21, 10:00 - 12:00
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: IMPROVING FFTNET VOCODER WITH NOISE SHAPING AND SUBBAND APPROACHES
Authors: Takuma Okamoto; National Institute of Information and Communications Technology 
 Tomoki Toda; Nagoya University 
 Yoshinori Shiga; National Institute of Information and Communications Technology 
 Hisashi Kawai; National Institute of Information and Communications Technology 
Abstract: Compared with WaveNet vocoder, FFTNet vocoder can synthesize speech waveforms in real time but the synthesized speech quality is not so high. To improve the synthesized speech quality of FFTNet neural vocoder while keeping the network model size for real-time synthesis, this paper provides the following four approaches. 1) The residual connections are introduced into FFTNet for improving the prediction accuracy. 2) Noise shaping and 3) subband approaches which can significantly improve the synthesized speech quality in WaveNet vocoder are directly applied to FFTNet vocoder. 4) Subband FFTNet vocoder with multiband input is additionally proposed for directly compensating the phase shift between subbands. The proposed approaches are evaluated by both objective and subjective experiments using a Japanese male corpus with a sampling frequency of 16 kHz compared with STRAIGHT without mel-cepstral compression, vanilla FFTNet and WaveNet vocoders. The results indicate that the proposed approaches can successfully improve the synthesized speech quality of FFTNet vocoder. Especially, the proposal with noise shaping significantly outperforms the STRAIGHT.